Scanning the past

Scanning the past

By Nathan Statz

May/June Edition, 2008: Genealogy has fascinated historians for centuries, much like how scientists are driven by a passion to understand things; genealogists are interested in where people come from and what their family tree entails.

For most people, the joy is in the exploration, delving into the unknown depths of family origins can turn up distant relatives of royal blood or the most dastardly of evildoers. If your ancestry leads towards despicable war criminals would you really want to know?

Thanks to the inquisitive nature of human beings, the general answer here is a resounding yes and much like the morbid curiosity that stirs people to slow down and look out the window a the scene of an accident, there’s a genuine interest in where your family lineage comes from.

Such curiosity is the driving force behind?www.ancestorsonboard.com, a new Web site that allows users to trawl through passenger lists of people departing the UK on a long overseas trip in the time period of 1890 – 1960. This encompasses a staggering 24 million passengers across 164,000 lists and shows everything from Olivia Newton John’s parents immigrating to Australia down to the members who departed on the Titanic’s fateful voyage.

The passenger lists come from records which until now were only accessible from a public viewing room in the UK National Archives. To find anything you had to know from which port the person had traveled from and in what time period. This has been replaced by a fully indexed and searchable Web site which has been setup as a co-operation project between the National Archives and www.findmypast.com.

To make the data publicly available and searchable meant physically pulling the files out of the archive’s shelves and scanning them by hand. This feat of digitising was made possible by a shift rotation, which kept the scanners going for 15?hours every day.

“The documents required flat-bed scanning technology, as they were not suitable for automated capture. We utilised a combination of A2 Scan2Net document scanners and A1 Proline document scanners for the larger documents,” said Debra Chatfield, spokesperson for Findmypast.

“We developed a bespoke scanning application software to manage the workflow and reconcile documents post scanning. Our developed software also enabled the scanning operators to perform image quality reviews on every image created on screen, before accepting the scan as meeting the stringent quality standards. This approach and solution created excellent images as can be viewed on the Web site.” Chatfield explains that as opposed to using automated scanning technology such as ICR or OCR, all the transcription was done manually. This was due to the documents lacking generic markings applied in a?consistent manner.

“Whilst automatic capture has progressed over many years, the copperplate handwritten nature of the majority of the documents would not have been suitable for any automatic capture extraction. The printed documents towards the end of the series were again so inconsistent in their fonts and formats, the quality of the printed matter was so poor it did not lend itself to accurate OCR extraction,” said Chatfield.

“Stringent quality audits were incorporated to ensure the data was to the highest standard, where possible, standardisation was utilised to increase accuracy further. For example, occupations were standardised where clearly evident to provide a consistent quality output.”

This isn’t to say the entire project was smooth sailing, Chatfield explains that the mix of A1 and A2 documents was not as predicted for the phases of document scanning required.

“All parties had to work together to accomodate the changes in the production patterns to ensure the overall project timescales was achieved,” said Chatfield.

There were other difficulties that had to be overcome, such as the National Archives need to incorporate a conservation approach to some documents due to their age.

According to Chatfield, this could only be managed by the National Archives and therefore again had an impact to the production schedules. “It was necessary to implement a further A1 scanner to accomodate the additional documents and?timescales involved.”

The mammoth scanning project has been a considerable success and has given birth to a unique business model where imaging and web-based subscription meet. Findmypast.com is looking to expand on this with plans to digitise the British Census documents from 1911 over the course of the next year.

Comment on this story.