Semantic Enrichment & IM: Turning Content Into Insight

By, Michael Iarrobino, Lee Harland, SciBite

Life sciences and pharmaceutical companies are increasingly turning to informatics teams to lead initiatives in artificial intelligence, cognitive search, and machine learning aimed at exploiting vast amounts of internal and external information.

Although these initiatives show great promise, the reality is often less promising. Organisations struggle with interoperability across different data streams. The US healthcare industry has captured just 10–20% of the potential value from big data analytics, according to McKinsey Analytics.

At the same time, information management is evolving. Information managers must balance the needs of multiple internal constituencies to support information discovery and manage subscription usage and budgets. Enhanced search, personalisation, and collaborative workflows should all be top priorities.

The challenge is that more than 90% of organisational data is unstructured, as is nearly all the peer-reviewed scientific research. Although it’s true that articles from journal subscriptions have a semi-structure of metadata and discrete article sections, actionable intelligence lies trapped in the unstructured research narratives of the full text.

As information management professionals consider approaches to make these insights easily discoverable and accessible, and to extract increased value from journal subscriptions and other unstructured content, their purposes may converge with those of their informatics colleagues. Although informatics teams may focus more on applying machine analysis methods, such as text mining, to content, the shared goal is to deliver actionable intelligence across the organisation.

To advance such efforts, informatics teams contribute expertise in data transformation and interoperability. Information management brings a strong awareness of how content consumers in the organisation prefer to discover and access content, depth of knowledge of the sources of externally published scientific research, and expertise regarding the efficiency, effectiveness, and comprehensiveness of information discovery.

How Semantic Enrichment Can Help

Semantic enrichment is the enhancement of content with information about its meaning, thereby adding structure to unstructured information. While unstructured content must be synthesised from scratch each time it is consumed, semantically enriched content has been annotated with its meaning, enabling users to move quickly to more intelligence-rich information activities.

Semantic enrichment can annotate unstructured text with information that links related concepts together and makes clear where these concepts sit in a hierarchical family. Think of the many ways to refer to cancers or neoplasms, and the types that are subordinate to the overall cancer concept. This enables a user looking for information about cancer as a concept to gather the many different references of it as a class, achieving a recall impossible with information in the unstructured text alone.

Semantic enrichment can disambiguate unstructured text. For example, consider the importance of being able to differentiate AIDS (Acquired Immunodeficiency Syndrome) from hearing aids. A user searching for aids using keywords will likely retrieve examples of both, while a user who can distinguish between the concepts achieves greater precision.

Semantic Enrichment Across the Organisation: Who Can Benefit?

Semantic enrichment opens possibilities to informatics and information management that couldn’t have been considered before. Keyword search and manual curation could never have approached the task of examining the more than 25 million articles indexed in MEDLINE to extract candidate relationships between whole classes of concepts like genes and diseases, irrespective of phrasing, location, source, and format. But semantic enrichment can exploit content in this way.

Semantic enrichment also helps solve many already existing and intractable problems across different functional areas and teams, including:

Early phase research. Semantically enriched content annotated with relevant biological, disease, protein, and gene concepts can be analysed to determine potential relationships between these. The resulting relationship graph can suggest potential biomarkers and drug targets, and the findings linked to supporting source content for validation prior to wet lab.

Competitive intelligence. Competitor patent filings, often intended to hinder discovery, can support improved recall through semantic enrichment that enhances text to annotate chemical substances. Non-patent literature (NPL) enriched using the same vocabularies can be explored alongside the patent literature to provide a full picture for patent landscaping or other competitive purposes.

Pharmacovigilance. Scientific research semantically enriched to identify adverse events and pharmacological substances can suggest links between the two, increasing the efficiency and comprehensive- ness of these vital monitoring workflows.

IDMP (identification of medicinal products) compliance. IDMP initiatives directed by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) aim to standardise how information can be expressed about pharmacological products. When semantically enriched, disparate internal and external content sources can be exploited to give a more robust view of product attributes.

Semantic Enrichment And Scientific Literature: Three Takeaways For Information Managers

Although semantic enrichment is a complex process, it produces powerfully simple business results. For information managers, it has the ability to reduce friction in the discovery, access, consumption, and synthesis of published scientific literature. Here are three takeaways:

  • Driving Discovery - Published journal content is typically enhanced with layers of metadata—abstract, publication type, date, keyword and topic categories, and many more—to enhance discoverability. Semantic enrichment provides a new approach that can go deeper than these traditional methods to expose the relevant concepts present in the full text of an article.
  • Measuring The Value of Content - Information managers continually analyse subscription and document delivery usage, adjusting their content sourcing as needed to meet evolving needs. Externally published literature that has been semantically enriched can be used and consumed in new ways, sometimes even without direct human involvement, and thus may not be properly represented in traditional models of article views, downloads, or purchases. Information managers may need new approaches and metrics to understand the value that published content delivers to their organisation.
  • Enhancing The Role of Information Management - Information managers have a unique perspective and expertise to contribute to semantic enrichment projects. Look for opportunities to partner with other groups, such as informatics, to contribute awareness of the ways the organisation currently acquires and uses published content.