ABBYY FineReader 11 in the wild

REVIEW. By Allon Maxwell

I have used OCR software since soon after I got my first scanner sometime around 1995. To get a useful result, you had to "fiddle" with the contrast and density settings of the scanner, for nearly every page, and then manually correct an embarrassing number of read errors. (around 20 to 30 per page).

You could only get a plain text file from that first software I used, but that was still a long way better than asking my wife to retype the whole document. It meant the difference between getting a workable plain text file copy, and giving up without even trying.

About a year later I graduated to my first commercial OCR, (TypeReader Pro 4). That improved the "error count" to about 8-10 per page. However, eventually, upgrading to a newer version of Windows rendered both the scanner and OCR software obsolete.

The new scanner (a Mustek 1200UB) came bundled with ABBYY FineReader Sprint version 4. Wow!! The error count now averaged only about 2-4 per page. Sprint 4 was also able to reproduce formatting of the original, and save the results in .rtf format.

A need for a better scanner with a photo slide copying attachment led to purchase of an Epson V350. The choice was heavily influenced by the fact that it gave me both the slide copy feature, and ABBYY FineReader Sprint 6. The "error count" was about the same as for Sprint 4. However, now I could send the results direct to a Word File which opened automatically, after recognition was completed.

However the very long "warm up" time of the Epson, made scanning a frustrating exercise for multi-page text documents, and it was quickly replaced by a Canoscan 8800F with virtually no warm up time. The downside was that, for my purposes, the bundled Omnipage SE 4.0 was a lot "clumsier" than FineReader. It was a "no contest" decision to store the Epson Scanner in the shed, but continue to use Sprint 6.0 with the Canon 8800F!! Now I had the best of both worlds.

Not long after that I became aware that ABBYY were selling their new FineReader PRO 9.0 locally. I was impressed with the features of the evaluation version, and quickly arranged to purchase a copy. (Competitor's Sales Managers please note - ABBYY seems to be about the only OCR software suppliers offering an evaluation version. I find it hard to understand why, in today's competitive world, the "others" haven't tumbled to the obvious benefits of that as a marketing ploy. A couple of generations ago, if we couldn't sample it, we called it "buying a pig in a poke"! I still try, very hard, not to do that.)

Since then I have routinely upgraded to FineReader PRO versions 10 and 11, as they became available.
I have only had version 11 for about a week, so my experience is limited. However in the interest of giving it a thorough workout, I have now scanned some 10 pages or so of Times New Roman and Arial text, in which there hasn't been a single "read error"!!

The really big personal bonus lies in ABBYY FineReader's superior ability to recognise 19th century typefaces. This has always been hard. (Not surprising when you remember that many of these old documents were printed in small town print shops, using homemade hand cast lead type, with ink quality depending on the "skills" of an apprentice who had to "paint" the ink onto the plates for each page.)

With older OCR software, "error counts" on these 19th century typefaces have been so high (sometimes upwards of 75-80%) that it simply wasn't worth the effort of trying.

ABBYY FineReader PRO 11 offers a feature which can OCR a PDF file made from scanned images, to add a searchable text hidden behind the images. I haven't had time to fully evaluate this feature, but my early impression is that results are at least as good as Acrobat Standard X, and Nitro PDF PRO 6.2. (Both of which I have copies of, and have used occasionally for this purpose)

However, for my purposes, the really great feature is the ability to OCR that same PDF file made from scanned images, and sent the result to a Microsoft Word document, complete with original formatting!

I have only tested this feature on one document so far, but the results are truly amazing.
The original file was a PDF made from scanned images of a 532 page book published in  1812.
(The Racovian Catechism, English Translation from the Latin, by Thomas Rees -- 44 MB download , available online from:  http://www.archive.org/download/racoviancatechis00reesuoft/racoviancatec... )

The original 44MB PDF became a 15MB searchable PDF file. And as a Word document it became a fully editable 3.6MB file.

A screen grab of a page from the 1812 Racovian Catechism, downloaded as a PDF file. Below is same page from the Word file generated with FineReader 11. The Word file is "as is" (including the 7.5pt font size!) without any editing.

While less than 100% perfect, accuracy on such an ancient document is remarkable, to say the least. For now I don't want to read the whole 532 pages to guess at a number, but read errors are going to be very low, especially compared with previous experience of some that older OCR software. With a little reformatting of page size and font size, and a determined spell check, it will be quite useful as a reference document which is much easier to use and much smaller to handle, than the original PDF.

There were a very small number of pages (about 2%) where ABBYY FineReader  said it had difficulty recognising the page, but I haven't had time to investigate the causes.
However when it did have problems it threw up a warning message noting the page number and the problem. These included things like "The font David does not contain some of the characters", or "Make sure the recognition language is turned on", or "The resolution of the source image is too small". That will make it easy to follow up later, if I ever need to.

IN SUMMARY
ABBYY FineReader 11 Professional Edition
PRICE: $A199
AVAILABILITY: http://finereader.abbyy.com/
VERDICT: Would I recommend it? Well of course you need to evaluate it for your own local  purposes which are bound to be quite different from mine. However there is an evaluation version available. And that is something you don't seem to be able to get from most of the opposition. But for me? This new version 11 is so good I may never have to upgrade again!!