Ontotext Metadata Studio 3.7 AI Model links to Wikidata
Ontotext has launched a new version of the Ontotext Metadata Studio (OMDS). It now enables you to tag your content with CEEL – a- text analytics service performing Common English Entity Linking.
CEEL is trained to tag mentions of People, Organizations and Locations to their representation in Wikidata – the biggest global public knowledge graph, which includes close to 100 million entity instances.
Wikidata entities have precise mappings to Wikipedia articles, where those exist – Wikipedia has about 7 million articles. Wikidata is also used continuously as a source for the enrichment of Google’s knowledge graph, which makes Wikidata popular for semantic SEO purposes.
Focusing on the entity types of interest, CEEL is trained to recognize about 40 million of the Wikidata concepts.
The purpose of models like CEEL is to streamline information extraction from text and enrichment of databases and knowledge graphs.
For instance, large language models (LLMs) are good for extracting specific types of company-related events from the news. They can properly recognize and classify places in the text where events are reported and extract the names of the organizations involved. What LLMs cannot do is disambiguate the names to specific concepts in a graph or records in a database.
An LLM can extract a relationship (for example, acquisition, which results in parent-subsidiary). But this new fact will not be ready to add to a database before the identifier of one out of multiple possible records for similarly named companies is selected via a service like CEEL.
Keeping with the spirit of the cross-domain nature of the product, the featuring of CEEL now enables the following capabilities within OMDS:
- Enhancing content discoverability by linking entity mentions in text to their corresponding Wikidata entries. This provides readers with instant access to additional global knowledge context.
- Aiding in the automated tagging and categorization of content. This facilitates more efficient discovery, reviews and knowledge synthesis.
- Content, enriched with such semantic metadata, allows for more precise search, better SEO and better performance of retrieval augmented generation (RAG) of LLMs and downstream analytics.
- Streamlining information extraction from large volumes of unstructured content. This enables organizations to quickly analyze and comprehend market trends or signals.
Evaluation of CEEL’s accuracy, using the most popular public benchmarks for this task, proves that it performs on par or better than the state-of-the-art AI models. More details related to CEEL’s architecture, evaluation, and general availability are available in the dedicated blog post.
This latest offering supplements the pre-existing core feature of OMDS that enables users to perform entity linking against their own taxonomies and reference data. Now they can easily combine and interlink their organizational and domain knowledge with the global body of reference of Wikidata into a single cohesive knowledge graph.
Another highlight of this release is the UX improvements for the Form workflow. The UI has been refined and now makes it more evident exactly how much information a certain annotation or Form section contains. The workflow has also been slightly modified to enable users to do important actions, such as saving and cancelling changes to annotations, with fewer clicks.
In addition, OMDS 3.7 streamlines the way the quick search works, especially when transitioning to the detailed concept search, which visualizes the comprehensive information for a specific concept in the graph.
The new release also expands upon the concept Highlight feature. It now allows users to easily see and “scroll” through each concept mention in the document. In this way, they can quickly grasp the impact and importance of that concept for the whole document. Other smaller improvements include general stability and vulnerability updates, making it the best outing of OMDS to date.