​A leap ahead to auto-classification at Auckland Transport

By Gerard Rooijakkers

Early results of a project to automate classification of more than 8 million documents at Auckland Transport are amazing and give us confidence that the objective of automating the addition of retention tags will be achieved.

The autoclassification system is being built by New Zealand’s Pingar using its text analytics and machine learning technology. Australia’s Synercon is developing the Ontology/Taxonomy and Retention rules.

Auckland Transport (AT) is a very dynamic council-controlled organisation in New Zealand. The main functions and activities are focused on keeping Aucklanders moving in a city that is growing fast.

Since its inception in 2010 - as part of the amalgamation of councils in what was then the Auckland Region - the organisation has grown too and with approximately 2100 employees and some 500 contracted staffing the administration of its daily affairs (almost 100% digital) has been ballooning as well.

As a Council-controlled organisation (CCO) in New Zealand, Auckland Transport is required to meet statutory obligations as set in:

  • the Local Government Act 2002,
  • the Public Records Act 2005
  • and Local Government Official Information and Meetings Act.

As a CCO, AT needs to be transparent and accountable, which requires adequate and accurate records management. Managing our information (that is both structured and unstructured data) created, received, aggregated, rendered in the conduct of its operations, AT’s daily operations.

AT’s unstructured data is captured in multiple MS SharePoint environments whilst we use several other main business systems to capture other data e.g. financial records in SAP and infringement and licence data in Pathway, customers in Customer Relations Management.

AT collaborates with many across the transport sector, with customers, contractors, commuters, communities. To give you an impression of the scale of our collaboration platform and the information that is being exchanged in the process:

  • We are dealing with some 8000 contractors
  • We run 100+ programmes and 3500+ projects
  • Use some 800+ business applications
  • Use 100+ document types
  • Have 18 functions & 80+ business activities
  • We process millions of AT Hop transactions a week generating Tb of data
  • Capture and manage CCTV footage across Auckland for the purpose of managing traffic flow, public safety and protecting AT’s properties

Managing information is an essential activity at Auckland Transport. Corporate Information Management is embedded in Business Technology.

SharePoint is AT’s appointed document management system. Team sites are being configured and managed in O365. Project Sites are configured in Fulcrum/Connect which is externally managed by LeapThought.

The volume of information is ballooning and in the various MS SP environments we have approx. 8 million documents and counting. Approximately 4 million documents in SP and 4 million documents in OneDrive. This has created some challenges in findability of information.

The SP Team Sites and Project sites have been configured in 2010 with the understanding that staff would enter some mandatory metadata fields when saving documents in SP team/project sites.

Every AT staff member receives SharePoint training, which includes the importance of good recordkeeping.

However, an audit in 2017 showed that this was not quite happening as anticipated. It confirmed earlier experiences of search issues or rather information retrieval issues. We knew the information was somewhere in our document management system but due to insufficient metadata attached to the individual documents other than the standard MS generated metadata it failed metadata that covered the content of the documents. Hence search results were far from optimal. Simply said staff were not always adding meaningful metadata in the mandatory fields or were circumventing it by saving documents in OneDrive.

Leaving metadata and file naming to humans is discovered to be the weakest link in a further well- appointed document management system.

The way forward to improve records management and achieve much better compliance is to use auto-classification and include retention and disposal tagging to assist and support AT’s approved life cycle management schedules.

An additional benefit is a far more user-friendly experience when saving files, searching for documents and being assured that the correct up-to-date information is retrievable from the document management system.   

So, with in our mind that artificial intelligence could help us out we started on a journey to compile a business case for auto-classification, a road paved with exploring opportunities to successfully implement it and retrospectively auto-tagging 8 million documents. That is the first phase of an ambitious plan to use autoclassification and including retention and disposal tagging to achieve life cycle management as part of our compliance programme.

AT’s information management philosophy is to use artificial intelligence and machine learning. It means the development of an AT-wide Ontology with related business specific taxonomies.  Much work forming an essential part of the project - aptly named - Haystack.

We have embarked on an innovative road trip to manage our information in an automated way different from the traditional EDRMS option. We understand that the human in our digital information world is the weakest link when it comes to adhering to recordkeeping practices. They are inclined to see any admin part of their job as not being an integral part of their work.

To alleviate this pain point we have opted for the auto-tagging of documents in a consistent and organised manner as it will not only improve the retrievability of information, it will also allow us to apply proper records management including the application of retention and disposal rules to fulfil a much better life cycle management of information.

We have introduced the auto-tagging in an agile manner with limited disturbance of an already fully engaged organisation. The interim results are – without being classified as a Trumpian achievement – great. It’s not only assisting and supporting our records management compliance but has revealed a far better insight in how AT does business with its customers, the ratepayers and its contractors. We envisage far more opportunities to enable AT’s business units to do their business better.

We look at other employment opportunities for this technology and its artificial intelligence beyond unstructured data. The aggregation of information which is currently hidden/obscured in silos, as well as the application for Building Information Modelling, management of images, CCTV footage and data and navigating to relevant information from GIS.

The analysis and reporting we can derive from the box of Pandora will assist management to make well-informed decisions as the information provided is assured to be up to date, the correct version and complete. In the end the auto-classification will account for big efficiency gains.

Search results have been improved as we now have all 8 million documents tagged and running the retention tagging process based on AT's approved retention schedule.

We are also continuously improving the tagging and enhancing the Ontology and taxonomies (as part of quality assurance).

As Auckland Council will be the receiver of archival records (records that will be retained in perpetuity) from CCO's as the custodian of council records, we are currently developing a process for the transfer of these archival records in PDF/A format. We anticipate that this will take some time to master as well as for council to have their digital archive sorted.

Gerard Rooijakkers is Corporate Information Manager at Auckland Transport.