Five common misconceptions about intelligent capture

by Gordon Sacuta

1. The goal is pass through automation. The goal for 90% of situations is improved human productivity – NOT pass through automation. Other than very structured forms where under ten data relatively simple (numeric, machine printed for example) fields need to be extracted, typically 80+% of extracted forms will need some human validation or correction.

In many cases 100% of the image page extraction will require some validation or correction by an operator. What businesses need to realize is a 90% confident extraction rate can still mean 100% of forms need to be reviewed. If a single character on one field is low confidence and/or cannot be validated against external data or business rules, the page must go for verification/correction.

The goal is to reduce from 100% Key from Image (KFI) to a much lower number. Reducing from 5000 key strokes to KFI from a complex form to 50 correction keystrokes has the potential to reduce the manual key effort from 100 operators to 1 as an extreme example.

2. Everything must be classified. The most common mistake on content management projects is to over classify documents. Categorizing every page of a package of related documents can have a huge cost impact, when operations may actually prefer to review the entire package of documents in 99% of query and reviews. Many times, the query and retrieve rate on the images and metadata drops to nil within weeks.

For the number of times a retrieval is needed, often finding the package and reviewing the 30 pages in the package to find a specific document with a viewer’s thumbnail view is relatively quick given the low likelihood a retrieval will ever be done.

In one extreme example at a financial institution, the JAD sessions came up with a classification structure for Mortgage documents that has 10 document categories, further downstream in the project, a senior manager decided within the 10 categories, sub-categories were absolutely necessary. The end result was the need to identify over 150 specific document types at scan time. The cost impact completely killed the project ROI.

Also, if documents are full text OCR’d and PDF Image with Hidden Text stored in a repository with full text indexing, over classifying and many attributes truly become wasted effort.

3. A huge amount of time and effort is required. If requirements are managed to what really needs to done, ensuring the document classification and data extraction are done at the level that is really needed to facilitate the business process, and the audit trail and source records management downstream. Most projects can be reasonably configured for intelligent capture within 50 to 500 hours of experienced capture specialist effort.

4. Unstructured documents are a huge problem. Some of the biggest advances in intelligent capture in recent years have been around classifying documents that are unstructured, and finding and extracting key data elements from unstructured content.

While significant extra CPU horsepower may be needed if the volume of unstructured content is high, understanding what the document is and finding key data elements is definitely doable with reasonable effort from a capture specialist.

5. A fax machine is the same as a scanner. This misconception is a bigger issue than most people realize. Fax machines are garbage as scanners. Standard Fax resolution is 204 x 98 DPI (Dots per inch). Fine resolution fax (which very few people think to enable) is 204x196 DPI. Good OCR results typically requires 300dpi.

Also, most fax machines reduce the size of the image to 96-98% of the original page size. Thirdly, no-one ever cleans a fax machine, leading to very noisy images. The low resolution and crappy compressed image, however, are only a small part of the problem. Often, when fax machines are part of the business process, they are used on a single document multiple times.

For mortgage processing for example, a broker may fax a form to their customer, who fills it out, faxes it back to the broker, who faxes it to the financial institution. Any attempt by the financial institution to automate the capture of that end form, has a very low likelihood of achieving good useful results.

Gordon Sacuta is a US-based consultant with Tauren Consulting Inc.