iText launches iText pdfOCR

iText Group NV has announced the launch of iText pdfOCR, built on the Tesseract OCR engine. Tesseract supports over 100 languages and was originally developed by Hewlett-Packard ('85), and was released under the Apache open source license in 2005. Since 2006, its development has been sponsored by Google.

iText pdfOCR, is part of the renowned iText 7 PDF SDK and offers Optical Character Recognition (OCR) functionality to convert printed text in scanned documents and images into a fully searchable PDF/A-3u compliant format (PDF version 1.7) and make accessing those texts easier and faster. Without machine-readable text, printed or scanned documents cannot be searched, indexed or interpreted.

Logical follow-up actions could be data extraction with iText pdf2Data, secure content redaction with iText pdfSweep, or multilingual document recreation with iText pdfCalligraph. Repurposing data can be done with the low-code document generator iText DITO.

"With COVID-19 urging companies to accelerate their digital transformation projects, organizations are forced to explore new ways of accessing and managing their data -- existing and new. Thanks to the OCR capabilities of iText pdfOCR many new opportunities will open up for users and enterprises that want to maximize their data potential." Yeonsu Kim, CEO at iText Group NV stated.

"Staying true to our open-source roots, we've decided to build iText pdfOCR upon the open-source Tesseract OCR Engine. With this, we wish to reconfirm our positioning as an open-source company - a value which is appreciated by our millions of users and clients."

"With this new addition to our PDF library, developers will now be able to leverage data locked away in documents which until now weren't accessible. Our latest product enables them to enlarge their digital workflow capabilities by accessing the data buried in scanned files and deploy it for any action or purpose they or their end-user would like." Tony Van den Zegel, VP of Products & Marketing at iText Group NV and General Manager at iText Software Belgium, said.

The applications of iText pdfOCR are various: for instance, archiving of historical documents, translations of legal documents, automatic data entry while processing all sorts of physical applications or claims, and sorting of otherwise not editable printed or scanned documents.