Using Text Analytics to extract metatada from SharePoint

By Robert Young, VP Australasia of Pingar

Most organisations do not realise that they can automatically extract information and value from text. Typically, they employ humans to do this manually, or worse, they ignore text altogether. Have you considered the cost to your business of creating and then storing data that just gets lost in your systems

Since 2007, New Zealand Company Pingar has invested in the academic and commercial research of text analytics and its application in the global business market. Pingar's singular goal is to extract value and sense from unstructured data using text analytics.

Unstructured data makes up 80 percent of data available today and is the main data type for social media, file shares, underutilised document management implementations, and research data.

Pingar's text analytics are a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources so they can be used as business intelligence, exploratory data analysis, research, or investigation.

Pingar is now poised to leverage its extensive research to bring disruptive solutions to unstructured data in an affordable, easy to implement, and effective way.

A diverse range of large companies are already using Pingar's solution including leading Australian retailer Coles Supermarkets, Commonwealth Bank of Australia - the 9th largest banking group in the world, USS POSCO the worlds largest steel producer, Ecolab, MGM Casinos Macau  and more.

Today Pingar delivers its unique capabilities with unstructured text via an Application Programing Interface (API) Server that allows developers to build solutions to organize, collate, and identify what is important and relevant inside their organisation’s text. The Pingar API will analyse its component concepts and suggest related search terms. You may also refine the results by providing an optional context document.

  1. Auto categorise documents against a pre-defined taxonomy/index
  2. Auto categorise documents into content-types such as "employment contract", "financial statement"
  3. Identify key concepts and phrases describing documents
  4. Identify people's names, organizations, and locations
  5. Identify the most relevant paragraphs providing a summary

Pingar is also taking technology a step further by developing applications that complement its core API capabilities. One popular example is the Pingar Metadata Extractor for SharePoint. Utilising Pingar’s natural language processing in order to transform unstructured data into usable structured data.

It reads SharePoint content, identifies phrases that describe the main topics and classifies (categorises) the content against taxonomies. It can detect organisations or companies, people, locations, addresses, account numbers, dates, and many other custom created entities such as unique SKU numbers at a retailer. It is able to match similar terms, misspellings of words, equivalent spelling in different variations of English and more language issues such as these.

In turn, Metadata powers SharePoint's search refiners, which allow users to rapidly get rid of irrelevant search results by showing categories, topics and other metadata. Each time a search is refined, the search refiners offered in SharePoint are from the reduced set, and therefore with a general idea of what the user is looking for, they are dynamically helped to find it.

Pingar Metadata Extractor for SharePoint

Canadian Institute of Mining (CIM) is the leading technical society of professionals in the Canadian Mining and Energy Industries. CIM has over 14,600 members coming from industry, academia, and government, serviced by their 10 Technical Societies and 35 Branches.

CIM’s challenge was finding information in SharePoint

The CIM library is for their members. Search results need to be filtered by keywords or topics to make it really easy to use. To accomplish that CIM utilizes Pingar to tag ALL documents with metadata EVERY single time.

Papers are written by professionals outside the organization and submitted to CIM.  Their staff had to read and tag all documents going into SharePoint and a typical document is at least 20 pages of scientific and technical information. Each librarian could only properly tag 180 documents per year, while CIM needed to upload thousands of documents. Therefore a solution was researched globally, resulting in the implementation of Pingar Metadata Extractor.

Gerard Hamel, CIM Director Information Systems and Technology, said “It was the exact tool we needed to index documents. Now we are able to upload thousands of documents, and we don't need to read every single one, because Pingar does it automatically."

In a few minutes this video will show SharePoint users out there what is possible to achieve quickly (see more here).

In addition, Pingar is currently developing other cutting-edge applications to reach out to multiple file shares, repositories, and other platforms to deliver a single view, categorise, classify, identify themes and subjects covered, even highlighting potential replication or versions of the same information, all from unstructured data.

Another existing application is the monitoring of text media where we are able to track where and when your company is mentioned, certain subjects, legislation, or competitive information. This is becoming an increasingly important area for companies to quickly understand what is being said about their brand. For further information please contact, Robert Young on +64 21 791 170 or by email