AWS launches Amazon Textract OCR

Amazon Web Services has announced the general availability of Amazon Textract, a fully managed service that uses machine learning to automatically extract text and data, including from tables and forms, in virtually any document without the need for manual review, custom code, or machine learning experience.

Amazon Textract is available currently in the US and Ireland and will expand to additional regions in the coming year. Open source alternatives include Tesseract ( ) or GOCR (

Amazon says Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms, information stored in tables, and the context in which the information is presented, such as a name or Tax File Number from a tax form or the product SKU or quantity in a warehouse from an inventory report

The extracted text and data can be used to build smart searches on large archives of documents or can be loaded into a database for use by applications, such as accounting, auditing, and compliance software.

Textract’s API supports multiple image formats like scans, PDFs, and photos, and can be used with database and analytics services like Amazon Elasticsearch Service, Amazon DynamoDB, and Amazon Athena and other machine learning services like Amazon Comprehend, Amazon Comprehend Medical, Amazon Translate, and Amazon SageMaker to derive deeper meaning from the extracted text and data.

Amazon says Textract analyzes virtually any type of document, automatically generating highly accurate text, form, and table data. Amazon Textract identifies text and data from tables and forms in documents – such as line items and totals from a photographed receipt, tax information, or values from a table in a scanned inventory report – and recognizes a range of document formats, including those specific to financial services, insurance, and healthcare, without requiring any customization or human intervention.

Results are delivered via an API that can be easily accessed and used without requiring any machine learning experience.

Amazon Textract takes scanned files stored in an Amazon S3 bucket, reads them, and returns data in the form of JSON text annotated with the page number, section, form labels, and data types. This data can then be used for a range of applications (e.g. generating smart search indexes, redacting text in a massive collection of forms, creating automated loan approval workflows, using the data for regulatory compliance, and flagging fraud risk for insurance claims). Customers can load the data into business software, such as spreadsheets, databases, and payroll systems, or they can analyze and query the data using Amazon ElasticSearch, Amazon DynamoDB, Amazon Redshift, or Amazon Athena.

PwC helps organizations and individuals create value by delivering quality in assurance, tax, and advisory services.

“At PwC, we work to provide our customers with intelligent automation tools that help transform previously manual processes. We've integrated Amazon Textract into our solution for the pharmaceutical industry to automate document processing for various FDA forms like MedWatch and CIOMS,” said Siddhartha Bhattacharya of PwC.

“Previously, people would manually review, edit, and process these forms, each one taking hours. Amazon Textract has proven to be the most efficient and accurate OCR solution available for these forms, extracting all of the relevant information for review and processing, and reducing time spent from hours to down to minutes.”

UiPath is a leading Robotic Process Automation vendor providing a complete software platform to help organizations efficiently automate business processes.

"Amazon Textract will further differentiate UiPath's robotic process automation platform by enhancing UiPath’s document understanding capabilities, enabling our customers to unlock critical business data from documents, transform that data into actionable business insights, and deliver those insights into line-of-business and operational systems," said Param Kahlon, Chief Product Officer of UiPath.



Business Solution: