NSW Parliament puts history online with ABBYY

A petition to the Governor of NSW was read out to the NSW Legislative Council on July 14, 1840. The petition, on behalf of “certain Inhabitants of New South Wales, and signed by 356 persons”, complained that the cost of disciplining the convict population should be shared equally by the British Treasury and the colonists who were lucky enough to employ free convict labour, not the remaining population who had to make good on their own.

The petitioners even suggested a fee scheme for “each Convict now in Private Service, the sum of Five pounds he be a Mechanic, and Two pounds annually, if he be a Shepherd, or Labourer, in the Country, or Five pounds if he be employed in the Towns; to be paid by the Recipients of Convict Labour, towards the maintenance of the Colonial Police”

This interesting little historical snippet and many more besides are now directly accessible on the Web site of the NSW Parliament, shedding light on the state’s evolution from a dedicated penal colony to democratic self-government.

A major project undertaken by ABBYY and the NSW Parliamentary Library has made available online a collection of over 60,000 documents previously held on microfiche. The documents have been optimised for online access so amateur and professional historians alike are just a simple cut and paste away from incorporating them in their research.

The records cover the period of the First Legislative Council from 1824 to 1855 and include tabled papers, Bills, minutes of proceedings, reports of debates, correspondence and more.

The First Legislative Council was the precursor to the current NSW Parliament and the records cover local matters such as the Census Bill of 1828 and international affairs such as the Declaration of the Crimean War.

The previously inaccessible documents uncover the real history of Australian states – how they made laws, negotiated and corresponded on private, business and political matters. They also reveal the complicated ties between Australia and Europe, Australia and New Zealand as well as their relations with the indigenous people.

The vast majority of the records have never before been available even to local historians.

The NSW Parliament completed the project in collaboration with ABBYY, a global provider of intelligent capture technologies and solutions.

ABBYY has a long experience in developing OCR software, particularly for libraries and universities in Europe that often have the added challenge of dealing with old texts and historic fonts.

It had previously helped digitise UK Parliamentary papers for the period 1700-1834, which were preserved and published online by the University of Southampton Library.

ABBYY has also undertaken large-scale digitisation projects for libraries as diverse as the 160 million page collection of the National Assembly Library of Korea, and the Royal Library in Copenhagen, Denmark, which holds nearly all known Danish printed works back to the first Danish book, printed in 1482.

To digitally conserve the NSW Parliament archive and make it accessible, it was necessary to apply document recognition technology that could handle low-quality documents in various formats.

Some of the 60,000 authentic pieces had dark background, some were microfilmed in low resolution, and others came with a mix of printed and handwritten text. However, the main challenge was dividing the documents into separate chapters and creating bookmarks for easy navigation.

The full-text optical character recognition (OCR) of documents was performed by ABBYY Recognition Server.

Then ABBYY FlexiCapture was deployed to handle automatic separation of headers, sub-headers and annotations.

An additional text layer was added to the images of all OCR’d documents to enable a full text search across the collection, while keeping original documents intact. The use of document classification and Hansard data extraction allows to easily find the right chapter, using standard PDF viewer/web page navigation tools.

Many of the oldest documents are handwritten in elegant but hard to decipher cursive handwriting.

While these are unable to be OCR’d and therefore not able to be indexed, the ABBYY solution was able to greatly reduce the image size of previously scanned PDFs with no visible quality loss using advanced Mixed Raster Content technology (MRC). Smaller PDFs are much more accessible via a web browser.

The job of manually keying in old handwritten documents may be something that is able to be accomplished by crowdsourcing in the future. While governments only have limited budgets for tasks such as this, ABBYY has been involved in previous historical archiving projects that have exploited the power of online volunteers.

The Museum of the famed Bolshoi Theatre has completed a major project to digitise a range of historical documents, with the aim of making the information publicly accessible and searchable via its website.

Four thousand volunteers helped digitize 48,000 historic posters, 120,000 programs and 100,000 rare photographs from the 240-year archives of the Bolshoi Theatre Museum.

During stage one of the Bolshoi project, ABBYY FineReader was used to convert document files into a digital format. The captured text was scrupulously verified by volunteers to find and correct mistakes that can occur during digitisation. The project united 4000 programmers, teachers, photographers, journalists, historians, artists, and students from 60 countries.

Throughout the project, ABBYY’s AI technologies were employed to empower intelligent capture from digitisation to the extraction of data.  During stage two, ABBYY text analytics software will process and categorise the unstructured data, connect names and roles and put the information into the correct database fields. The results will be checked by volunteers across many locations using ABBYY’s web-based interface.

This information will then be returned to the museum experts for further analysis, and then made available to the public on the Bolshoi’s Web site database.

The new online collection of NSW Parliament materials was announced at a private event attended by Members of Parliament, academia and Historical Society as well as Parliament’s senior administrative staff.

“I am proud to say that the New South Wales Parliament is now able to provide unprecedented public access to its historical materials – the documents that exemplify the early transition of Australia from a penal colony governed by autocratic external officers to a responsible government,” said Deborah Bennett, NSW Parliamentary Librarian.

“Some things here will change the way we write about Australian history, because they've been in a manuscript format and, in a sense, they've not been easily accessible except in a library with access to the few printed volumes,” said Carol Liston, Associate Professor of Australian history at School of Humanities and Communication Arts, Western Sydney University (WSU) and current President of the Australian Historical Society.

“People would not have known about them. What you've done here is provide us, as historians, with a chance to rewrite the history of New South Wales and this must be a really exciting contribution,”

The invaluable collection is now freely accessible online on the NSW Parliament website: https://www.parliament.nsw.gov.au/hansard/Pages/home.aspx?s=1