Guide to Handprint Extraction and Hyperautomation

By Walter Lee, Ephesoft

Capturing handprint data has been a source of hardship for most organizations, which is why we’re covering the top five factors to consider when working on any hyperautomation initiative that touches handwritten data and documents. With that in mind, Ephesoft has some exciting new capabilities and best-in-class accuracy rates that are the highest in the industry for cursive handwriting. 

Anyone who thinks handwritten data only exists on fixed, standardized forms has never really dug into a government process or worked on a non-fillable, online form for an application in the commercial industry. Form variations and versions, poor scan quality, fax conversions and more can all impact the layout of information that exists on a form. When those factors come into play, templatized data extraction just doesn’t cut it.

Extrapolating from an AIIM survey of about 300 respondents, more than a quarter of all organizations use forms and documents that contain handprint as a part of their key business processes. This includes processes that have been automated – or attempted to be automated – with tools like RPA systems and BPM applications.

- 26% of organizations rely on forms and documents with handprinted data for key business processes

- 42% of organizations that use forms for information and application submissions receive at least half of those forms with handprint data

Handprint-submitted information is the norm, rather than the exception for public and private sector organizations.

Bottlenecks

The public sector is one of the biggest culprits of handwritten data process bottlenecks. Where constituents are concerned, most government functions involve form submissions. Whether we’re looking at federal departments or state and local agencies, budget and technology adoption are definitely factors.

Operational budgets are always in jeopardy, and even more so after the fallout of the COVID-19 pandemic. Then, think about processes like permit applications, information requests, benefits applications, tax deferrals and more that need to serve across all socio-economic classes. With limited staff, furloughs and an overall diminished work capacity, government agencies are going to be inundated with a manual data entry backlog for all these constituent-driven requests. 

In addition, new privacy laws add complexity to the process by requiring the ability to redact Personally Identifiable Information (PII) on documents, including handwritten forms.

However, the commercial world is not immune to the informational black hole that is handprint data. For example, insurance companies accept claim forms from policyholders submitted with handwriting. Or, remember every time you’ve been to your primary physician and had to fill out the “updated insurance and contact information form” in the waiting room with all the other coughing, miserable-looking fellow patients. All of that medical information is crucial and must be maintained in a digital, searchable and actionable format.

Intelligent Document Processing Platform

Within Ephesoft Transact, content moves from the point of actual document capture or ingestion – either as a part of an external system’s workflow or a standalone action – through to the point at which processed documents and their associated metadata and index values are routed to their final destination or the next step in the business process.

What’s new and exciting for Transact is what we’ve added more data extraction capabilities. First, we have a new advanced hOCR Plugin, which provides capability for handprint, cursive and machine print extraction. 

Second, Transact’s handwriting recognition/Intelligent Character Recognition (ICR) feature builds on the bounded-block printing (OCR) along with checkbox/Optical Mark Recognition (OMR) extraction tools launched in a previous release. Together, they offer a highly comprehensive, reliable yet fast and easy way for users to extract handprint values from documents and detect signature or checkbox filled areas.

The technology reads both printed and cursive writing at an accuracy up to 88% out-of-the-box, even handwritings that are not easily legible by human review. Users can use a scanner, tablet, phone or other means to scan the document or form.

ICR converts images of handprinted text to an editable and/or searchable file format. Traditional ICR engines range in accuracy and server requirements, and there are new, cloud-based and machine learning-trained ICR engines hitting the market on a fairly regular basis. OMR captures data from form elements such as checkboxes and multiple-choice bubbles.

5 Factors for a Successful Digital Transformation Project

According to a PTC report on digital transformation, executives report that the key benefits of these initiatives are improved operational efficiency – key for shrinking IT budgets, faster time to market with new products and services and a closing gap between customer expectations and company delivery.

And for most organizations, handprint data is inextricably linked to a holistic approach to hyperautomation. So, let’s look at the top five elements of a successful digital transformation project where handprint data is concerned.

1. Document Source: First, consider the source of your content. How are you receiving documents with handprint data? This will vary by industry and process, but it’s important to the overall workflow and success or accuracy rate of your automated data extraction project. Are documents coming in as faxes, emails, to a centralized location as snail mail, or are pictures of handprinted forms and letters being uploaded to a company portal from a snapshot on a mobile device? These methods of getting formerly paper documents into your organization’s key information systems should be aggregated and routed through a centralized process to ensure uniformity and optimize your document-specific business processes. Fortunately, with an application like Transact, we support ingestion of documents from all these sources.

2. Security: The second factor to consider for a successful digital transformation that includes documents with handprint are the security requirements of your organization or industry. Are there security requirements specific to accessing or integrating with public cloud applications?

On the other hand, Transact can be installed on-premises or leveraged in a private cloud. There’s no need to send sensitive information or customer data outside of your organization’s network and firewalls to take advantage of the rich capabilities of ICR.

3. Complexity and Variation: Transact utilizes key-value (or KV) pairs to identify and extract information from unstructured documents. This means the application looks for a particular pattern of text representing the key, and then finds the corresponding value based on the desired extraction field value’s relationship to the key. This approach eliminates the need to configure coordinate-based templates to extract data and can be applied to handprint data.

Some handwriting solutions impose character limits or a per field-basis cost that can be inflexible or costly. However, with Transact, OpenText Capture Recognition Engine (formerly Recostar) OCR engine as well as advanced hOCR Plugin (powered by Google Vision or Azure (coming late 2022) in the product utilizing a whole-page OCR approach, there’s a minimal cost to you or your clients for using handprint extraction, either on a per field or per page basis.

Many capture products will require that you create fixed forms or coordinate-based templates in order to extract handprint data from scanned documents. If you have a project where you only have a single form and a single variation of that form, this might not be a big deal from a project configuration and management perspective. But, what if there are historical variations of that form accessible to the public? Maybe the form gets updated each year or expanded. What if – in the government space – the form varies by state?

Now, you have to consider the document source and the impact that could have on the actual layout of the documents. If a form is captured on a mobile device, that could impact the aspect ratio of the digital image. When physical documents are scanned, they could be skewed or laid incorrectly on a scanner bed, again, impacting the actual zonal coordinates of a document. The same holds true for distortions that take place when a document is faxed. If you’re relying on a fixed area of a page to find handprinted information, your ability to scale in any true digital transformation project is going to be limited on the time or cost that must be dedicated to design each coordinate-based template.

That’s why it is important to find a solution that can balance the level of predictability and extraction accuracy you require with the costs to accommodate the variations of forms you need to process, while still meeting your timelines. The good news is that Transact provides a comprehensive and flexible array of tools for your use case to find that balance. Everything from traditional fixed form extraction to more advanced ruleless AI entity extraction is included and works seamlessly with our handwriting solution.

4. Signature Detection: The fourth factor for a successful digital transformation project where handprint and handwriting are involved is interacting with and managing documents with signatures. Signature detection can be a manually intensive, time-consuming step that slows application processing across all industries.

Take the mortgage industry, for example. There is typically a large packet of loan-related documents that range in page length and complexity. Validating that any borrower and co-borrower have signed is crucial. But when a human being enters the mix, they spend time flipping through pages to find that one field and give the packet the green light. Automating this step saves time and therefore money in expediting the loan process.

Another example is the background investigation processes. Or, human resource employee onboarding. Every industry and every company has some type of application process where people are required to sign documents, forms or letters of explanation. And putting some automated validation checks in place for these ubiquitous tasks has enormous time-saving potential. I believe it should be a key element of digital transformation initiatives.

5. OCR, ICR or OMR – Extract what you need: Lastly, just make sure you understand whether relevant data is machine printed, handwritten, presented with optical marks like checkboxes or radio toggles, or in some combination thereof. Sometimes there are project requests where a company or agency wants to have every field on a form or piece of information on a document extracted. But when I ask why… when I ask about the purpose of the information downstream, there is no answer. Limit the scope of your project to include relevant data only – whether that data is a signature on a contract, a checkbox on a form, machine or handprinted piece of information – and your project timeline and cost of implementation will benefit. 

In summary, items to add to your digital transformation initiative play list are: consider your document sources, look into security requirements for data transmission out of your organization’s network or firewall, look for the complexity and variety of forms and documents, single out workflows that require signature validation, and get a solid understanding of the OCR, ICR and OMR data extraction requirements as they relate to the business process at hand.

For more information, click here or watch the webinar replay here