Improving forms processing and OCR/ICR accuracy

Improving forms processing and OCR/ICR accuracy

Companies often buy OCR (Optical Character Recognition) and forms processing software with the expectation of obtaining an almost perfect recognition rate. However, when it comes to real world applications the actual results may fall short of this expectation.

More than likely disappointing recognition rates are the result of a forms processing or recognition software program used for the wrong application or a full featured program that is under utilised.

According to Tim Dubes, director of Corporate Communications with US-based Cardiff Software, accuracy is a key component of effective automated data collection as well as thoughtful design of the entire system from the format of the form through to post-processing operations.

Mr Dubes provides the following tips as general guidelines for improving accuracy and achieving effective forms processing.

1. Design forms for automated recognition

All to often users employ poorly designed forms that do not provide respondents with enough room to write or forms that are convoluted in flow and format, confusing the person completing the form and creating incorrect responses before software recognition starts.

Begin with a plan for the form that is attractive, well spaced to avoid overlapping responses, and succinct, collecting only the data that is useful and necessary.

2. Select the best recognition format for each field

Each field on a form should use the recognition methodology that strikes the best balance between user and computer friendliness.

In a warehouse application for example, it may be easier to have an open field where a worker can place a bar code sticker associated with a given shipment, rather than rewriting the order number.

3. Prompt respondents to complete the form accurately

No matter what the audience a well conceived form should include directions for proper completion.

4. Use multiple recognition engines

Any recognition engine, no matter how advanced, cannot be all things to all applications. Some are stronger at alphabetic rather than numeric characters, others work well on clean text but fail quickly on degraded text. Forms processing systems that employ voting algorithms-multiple OCR/ICR engines with a decision management layer are more accurate than single engine programs.

5. Employ character specific definition

If you know that only numeric characters are going to be used for a specific field, program the software accordingly.

6. Use field specific dictionary look ups

Users can associate a dictionary list of acceptable responses on a field by field basis. The software is then able to go beyond character recognition to perform contextual matching.

7. Perform calculations automatically

Among the most common and preventable errors are mathematical calculations. Instead of just reading digits independently, program the software to calculate totals, subtract discounts and match prices with inventory items.

8. Reading addresses

Perhaps the most important data collected from forms is address information. There are forms processing programs available that can compare a printed address against a database to determine if it is a valid address.

9. Electronic form fill - validate data as it is entered

If your company has several remotely located offices, complete forms on-screen rather than faxing or mailing them.

10. If you do not get 100 per cent recognition - make sure it is easy to correct

There will always be some forms that require validation and human intervention. Make sure there is a logical method for correcting forms as they are processed either ad hoc or in batch mode. Advanced software programs can flag all the questionable characters in batch forms and present them to a corrector in a single on-screen stream.

TELEFORM USERS

One of the key products released by Cardiff Software is Teleforms, a forms processing applications. Among the local users of the software is the NSW Fisheries Research Institute, which uses Teleform Standard to collect the high-volume and detailed information it requires for monitoring commercial fisheries.

More recently, it has started using Teleform Elite to automate the entry of catch and effort data for recreational fishing trips made by the charter boat industry. In each instance, a form has been optimised for ICR (intelligent character recognition, ie handwriting) which the fisheries fill out and mail or fax to the Institute. The data is automatically scanned, verified and entered into a database.

Another local user, according to Telesystems, the Australian distributor of Teleforms, is the Royal North Shore Hospital Centre for Anaesthesia & Pain Management Research, a unit of the University of Sydney.

Teleform is used for Patient Assessments, Admission Checklists, Patient Questionnaires, and Demographic History. The Teleform standard is used to automate the collection of the data and its entry into the patient Database.

All forms use a mix of ICR for handwritten data, OMR for filled in circle responses, and OCR (optical character recognition) where fields on the form are machine printed, and they are scanned using an industry-standard scanner.

Although the volume of forms is not high, the amount of data on each one means that Teleform is the most cost effective method of data collection and data entry, and the forms for each week are processed in approximately half an hour.

Business Solution: