ABBYY Adds AI-Native Output to FineReader

ABBYY has released FineReader Engine 12.8.0 with support for DocLang, a new document format built for artificial intelligence rather than human readers.

DocLang gives developers a single, AI-readable structure for feeding documents to language models and agentic AI systems. ABBYY says the format cuts token usage and latency while improving structural accuracy.

The standard was founded by ABBYY with IBM, HumanSignal, Nvidia and Red Hat, and sits under the Linux AI & Data Foundation. The working group argues that formats such as PDF, HTML and Markdown were designed for people.

That forces developers to build custom parsers at every integration point. The group says the patchwork raises compliance risk and increases the chance of model hallucinations.

DocLang aims to create a reliable layer between unstructured content and AI systems. ABBYY ran a controlled benchmark comparing the same document, model and task in PDF and DocLang form.

The company claims the DocLang version improved output quality, lifted accuracy and reduced compute cost. An interactive version of the benchmark is published at https://doclang-benchmark.abbyy.tech.

The controlled experiment processes an annual report, a clinical study and vendor contract that represent the vast variety of enterprise documents rich with information created for human understanding but challenging for machines to parse.  

“ABBYY FineReader Engine is already used by thousands of organisations processing billions of documents every year,” said Max Vermeir, vice-president of AI strategy at ABBYY. 

“Now with DocLang as an AI native format, more companies will be able to accelerate innovation and have faster access to their business data to make smarter, more impactful decisions.”   

DocLang creates a reliable abstraction layer between unstructured data and intelligent AI systems. It standardizes the cacophony of digital document formats that enterprises operate on and gives AI systems the deterministic structure they need to perform reliably at enterprise scale.  

Continued Vermeir, “DocLang is specifically engineered to address industry challenges with a minimal, standardized, and AI-native method for representing document structure, meaning, layout, and governance. 

“FineReader Engine with DocLang support was designed for efficient machine processing and a predictable structure optimized for modern AI tokenization and modelling techniques. Organizations will see a significant difference with more reliable interpretation, increased accuracy, and lower computational costs.”  

https://www.abbyy.com

 

ABBYY has released FineReader Engine 12.8.0 with support for DocLang, a new document format built for artificial intelligence rather than human readers.

DocLang gives developers a single, AI-readable structure for feeding documents to language models and agentic AI systems. ABBYY says the format cuts token usage and latency while improving structural accuracy.

The standard was founded by ABBYY with IBM, HumanSignal, Nvidia and Red Hat, and sits under the Linux AI & Data Foundation. The working group argues that formats such as PDF, HTML and Markdown were designed for people.

That forces developers to build custom parsers at every integration point. The group says the patchwork raises compliance risk and increases the chance of model hallucinations.

DocLang aims to create a reliable layer between unstructured content and AI systems. ABBYY ran a controlled benchmark comparing the same document, model and task in PDF and DocLang form.

The company claims the DocLang version improved output quality, lifted accuracy and reduced compute cost. An interactive version of the benchmark is published at https://doclang-benchmark.abbyy.tech.

The controlled experiment processes an annual report, a clinical study and vendor contract that represent the vast variety of enterprise documents rich with information created for human understanding but challenging for machines to parse.  

“ABBYY FineReader Engine is already used by thousands of organisations processing billions of documents every year,” said Max Vermeir, vice-president of AI strategy at ABBYY. 

“Now with DocLang as an AI native format, more companies will be able to accelerate innovation and have faster access to their business data to make smarter, more impactful decisions.”   

DocLang creates a reliable abstraction layer between unstructured data and intelligent AI systems. It standardizes the cacophony of digital document formats that enterprises operate on and gives AI systems the deterministic structure they need to perform reliably at enterprise scale.  

Continued Vermeir, “DocLang is specifically engineered to address industry challenges with a minimal, standardized, and AI-native method for representing document structure, meaning, layout, and governance. 

“FineReader Engine with DocLang support was designed for efficient machine processing and a predictable structure optimized for modern AI tokenization and modelling techniques. Organizations will see a significant difference with more reliable interpretation, increased accuracy, and lower computational costs.”  

https://www.abbyy.com