Pingar, the New Zealand provider of unstructured data management solutions, has announced it has begun trials of its automated Taxonomy Generator Service with selected customers.
The Service enables enterprises to generate instant taxonomies from an analysis of the content of large internal document sets. The Pingar research team, headed by Dr. Alyona Medelyan & Professor Ian Witten, has combined natural language processing and machine learning technologies with Linked Data to create a unique approach to the challenge of automatic taxonomy generation.
Peter Wren-Hilton, CEO of Pingar said: “Building enterprise taxonomies ‘on the fly’ is one of the true holy grails of providing real structure to the 80% of unstructured data held by enterprises today. Custom-built taxonomies have traditionally been both expensive to build and expensive to maintain. The Pingar Taxonomy Generator Service allows enterprises to build multiple taxonomies ‘on the fly’ in order to manage departmental, project or enterprise-wide document sets.”
“The Pingar Taxonomy Generator Service supports a wide range of unstructured data types including Office documents, PDFs, emails and their attachments”, Wren-Hilton added. “The launch of the beta Pingar Taxonomy Generator Service is the culmination of two years dedicated research work by some of the leading experts in this field.”
Dr Alyona Medelyan, Pingar Chief Research Officer, writes on the company blog,”We argue that a useful taxonomy is one that contains terms relevant to the documents it is meant to organize. These terms can be sourced from existing taxonomies, Wikipedia, using entity and terminology extraction algorithms. Then, it’s the matter of grouping these terms into a meaningful hierarchy.
The image below explains how the Pingar Taxonomy Generator works. It receives as an input, documents in various formats, which may be stored on a file-share, in a document management system such as SharePoint, or on an Exchange Mail server. These documents are then processed and analyzed using a variety of tools and datasets, in order to extract taxonomy terms and relations between them. The output is a taxonomy, which combines these terms and relations into a single hierarchical structure useful for document organization.