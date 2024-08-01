Iterative has announced the upcoming release of DataChain, a new open-source tool for processing and evaluating unstructured data.

The announcement comes as businesses are struggling to harness the full potential of generative AI (GenAI). According to a recent McKinsey Global Survey, only 15 percent of surveyed companies have seen meaningful impacts from GenAI on their operations. This low adoption rate is largely attributed to the difficulties in processing and evaluating unstructured data at scale.

“The biggest challenge in adopting artificial intelligence in the enterprise today is the lack of practices and tools for data curation and generative AI evaluation that can ensure the quality of results,” said Dmitry Petrov, CEO of Iterative.

Petrov emphasized the need for AI models capable of evaluating and improving other AI models, a concept previously limited to industry frontrunners like DeepMind and OpenAI.

DataChain aims to bridge the gap between traditional structured data technologies and modern AI workflows based in Python. It democratizes advanced AI-based analytical capabilities, such as using large language models (LLMs) to evaluate other LLMs and performing multimodal GenAI evaluations. This approach levels the playing field for data curation and pre-processing, making these sophisticated techniques accessible to a broader range of AI engineers and data scientists.

The proliferation of sophisticated AI foundational models opens the door to intelligent curation and data processing. However, the absence of easy solutions to wrangle unstructured data using AI models in easy-to-manage formats keeps the technology barrier high.

In practice, most AI engineers are still building custom code for converting their JSON model responses, adapting them to databases, and running models in parallel with out-of-memory data.

DataChain can also store and structure Python object responses using the latest data model schemas - such as those utilized by leading LLM and AI foundational model providers.

