Pingar takes the sting out of SharePoint metadata
New Zealand’s Pingar has demonstrated a future without the need for laborious manual entry of metadata, showing attendees at the Australia SharePoint Conference 2011 in Sydney how its advanced data analysis and search technology can be used to automate the tiresome process.
The SharePoint Conference was the venue for Pingar’s release of an Application Programming Interface (API), which has been used by NZ Sharepoint developers Provoke to develop an innovative SharePoint 2010 Web part. This provides automated extraction of metadata from any text-based document to populate List fields in SharePoint.
Traditionally, this information must be manually entered by the user as they upload a document to a SharePoint library, through applying relevant keywords from dropdown lists.
However Pingar’s advanced discovery platform can analyse a document and extract addresses, names and other category information then automatically use this to populate metatata fields. Pingar can also perform other functions such as “Sanitising” text by replacing sensitive information with dummy names, addresses or other data to render a document harmless.
Pingar has developed its own algorithms that are used to extract information such as names, addresses and dates from a text document. Other text analytics strategies are based on dictionaries and have varying results in recognising the wide variety of content that can appear in documents.
Pingar Chief Research Officer, Dr Alyona Medelyan, said the approach developed by NZ’s Provoke had the potential to save knowledge workers from the metadata burden.
Originally from Ukraine, Medelyan has Master's degree in Natural Language Processing from Freiburg University in Germany and completed a PhD in Computer Science at the University of Waikato in 2009.
Medelyan speaks Russian, Ukrainian, and German, and is learning Chinese. Her grasp of languages gives her a distinct advantage in the unique field of computational linguisitics.
During her PhD Medelyan developed an open-source tool, Maui (Multi-purpose automatic topic indexing) which automatically identifies the main topics in documents. Maui is now used by companies and organisations around the world.
For the past 18 months, Medelyan has been the lead software engineer at Pingar. She has previously interned at Google New York.
Pingar's co-founder and CEO Peter Wren-Hilton says Pingar can not only solve the headache of managing large amounts of information, it can help organisations use stored content to their advantage.
"With data in companies growing at 40 per cent per year, searching for the right information is becoming more difficult, time consuming and expensive," says Wren-Hilton.
"We can all relate to the frustration involved in spending hours searching for information. Pingar helps you find the most relevant and useful results twice as fast as was previously possible."
Pingar’s new Application Programming Interface (API) will allow a company's existing software to work with Pingar's data analysis and search technology to manage unstructured electronic data, e.g. documents, web-pages, emails, news or any kind of text.
The API can be used to make products that extract useful information from masses of documents – for example quickly and accurately compiling a list of phone numbers from a whole database of unsorted documents, or quickly identifying and removing private personal details from documents that need to be publicly released (for example under the Official Information Act)
Pingar's API will be available free to software developers for a limited "development" period, to enable developers to work with Pingar's tools to build specific solutions for their clients.
Alyona Medelyan says developers will be able to pick and choose between the different software components, so clients will only pay for the features they need and those that offer the greatest benefit to the company. Pingar has a small R&D team of five in new Zealand but is looking to hire C# developers.
The API has three aspects:
Rapid Discovery, which can be added to existing search engines for rapid query refinement and results assessment;
Entity Extraction, a suite of tools that turn documents into useful lists of entities including people's names, telephone numbers, credit card numbers and organisations; and
Content Analysis, which can provides precision keyword extraction and one-click document summarisation and sanitisation.