Records Management & Big Data

By Rolf Green

Working across a number of industries over the last few years it always surprised me that Big Data projects either struggle with meeting records management requirements or ignore them altogether.  My surprise comes from the fact that not only is Records Management a compliance practice, but it is one that can literally pay for itself.

Big Data was coined by NASA as a term to describe data sets too large to analyse. It has since become a way of describing the problem of cost-effectively managing the large amounts of data regularly found in today’s financial and mining companies.  And here the data size is growing enormously; not only due to the natural increase in business growth, but also through the adoption of social media and through supporting the Big Data analytics.

So where does Records Management come in?  Too often Records Managers are seen as the company librarians hidden away, or as a compliance roadblock telling you what you can’t do.  What is not appreciated is that Records Managers have been fighting the storage battle for a lot longer than IT.  Be it physical or digital storage the desire has always been the same; store what is needed, store only what is needed, keep it available and keep it cheaper.

Then came the storage revolution.  Seemingly overnight the amount of storage in your common PC was the same as the RAID server you were running as a document cache, and we moved from talking about gigabytes to petabytes.  Storage became cheap, microfilm and magneto-optic disks became museum pieces, and the old techniques were forgotten behind IT’s increasing responsibilities and reduced budgets.

Today, however, data usage has grown to such extreme levels that companies are again being forced to focus on storage.  Even with the move to Flash storage and the lower costs it promises there is still a realisation that growth in data production is growing well in excess of the decreases in storage costs.

Enter the Big Data problem.  In trying to gain competitive advantage through enticing new customers or reducing operational costs, companies have been building Data Marts, Data Warehouses and Data Lakes.  To achieve this, Big Data systems are built and data is sourced from client systems, and herein lays the heart of the problem.  Copies of data are expensive, become unmanaged and in some cases stop being copies and are new records that become unmanaged. The solution is by no means easy, especially if you are already well down the Big Data path.  The first step is realising that you do not have to reinvent records management to solve this new issue.  Borrowing from the techniques of the past we find that a better appreciation for the concept of archiving, as opposed to off-loading or backing up, can be a saviour.  When matched with a clear meta-model that allows tracking the source of truth for data across the lifecycle it will allow synchronisation as well as drive discovery across client systems, archives, and analytics platforms.

Sure there will be some new patterns applied, as is always the case with new and more capable technology, but Records Managers have been doing this for a while and may just surprise you with how quickly they can adapt.

Rolf Green is a 20 year veteran of the storage war having driven Records, Information Document and Content Management solutions for companies across more than thirty countries.  He has consulted to industries including Mining, Oil and Gas, Utilities, Aviation, Finance and Government and currently holds the role of Head of Records Governance and Data Compliance for the ANZ Bank in Australia. He will be presenting on Recordkeeping in a Big Data world - Reinvention or re-adoption? At the 7th Annual, National Records and Information Officers’ Forum 2016, Melbourne Convention & Exhibition Centre, February 22-25, 2016. (The comments here are that of the author and not of ANZ Bank.)