AI-Powered Monitoring of Unstructured Text

Anomalo has expanded its platform that monitors the quality of structured data in data warehouses and data lakes to monitor unstructured text.

The unstructured capability makes it possible for enterprises to discover, curate, leverage and ingest high volumes of text data without the risk of using low quality data, which is especially critical for Generative AI applications. This new feature is currently in private beta.

Ninety percent of enterprise data is unstructured. Unstructured data does not comply with traditional standard formats which makes it extremely challenging to organize, store, search, retrieve and analyze.

Unstructured data itself is also problematic as it often contains inconsistencies, errors and duplicated content. Even more problematic is that unstructured data can contain sensitive confidential information, including company intellectual property, personal identifiable information (PII) and abusive language.

These combined challenges can lead to privacy, security and performance risks, especially as this data gets incorporated into Generative AI models and applications.

Organizations are implementing Generative AI and ingesting unstructured text for the purposes of model training, fine tuning and Retrieval Augmented Generation (RAG) at a volume and velocity previously unseen. As a result, organizations need to be able to identify and resolve quality issues with such data before it gets incorporated into Generative AI models and impacts their performance.

With Anomalo’s new unstructured capability, unstructured text documents can be curated and evaluated for data quality around various document and document collection characteristics, including document length, duplicates, topics, tone, language, abusive language, PII and sentiment. Users are able to quickly evaluate the quality of a document collection and identify issues in individual documents, dramatically reducing the time needed to curate, profile and leverage high-value unstructured text data.

Elliot Shmukler, co-founder and CEO of Anomalo, said: “It’s been well known that higher quality data leads to better data products, including traditional dashboards and machine learning models. The same is true in the world of Generative AI, where the quality of the text used to fine-tune or prompt the model via RAG could be the difference between a high performing application and one that is at best underwhelming and at worst, a privacy and compliance risk.

“We’re supporting data teams in using high quality data for all of their critical initiates and with our new unstructured text monitoring capability, to support their Generative AI efforts as well.”

https://www.anomalo.com