The death of recordkeeping
Recordkeeping is the process of making and maintaining complete, accurate and reliable evidence of business transactions, and government records are crucial to individuals seeking to establish their identities or ensure their entitlements to basic human rights. With their enormous capacity to store information quickly and cheaply over a timescale of years, the computerisation of the workplace should have made the task of recordkeeping within organisations easier. In fact, standards of recordkeeping in many organisations have declined over the 25 or so years since computers became common in the workplace.
David Fricker, Director-General - National Archives of Australia, comments: “When computers came in, all the processes for records management went out”. A report by the Australian National Audit Office (ANAO) on recordkeeping within a number of Australian Federal Government Agencies conducted in 2012 found that “The large majority of the agencies’ records were created, captured and/or managed in the agencies’ records management and other systems” but noted that “The [non-records management] systems ... did not generally meet legal requirements relating to the management, and destruction or transfer of records”. Responding the report, National Archives of Australia observed that “...keeping records in multiple systems, particular where digital records are duplicated in paper format... presents a multiplicity of problems and increased risk, including loss of context, increased costs and reduced efficiencies because of difficulty in locating and retrieving records when needed, and inability to identify the authoritative record.”
The key to the paradoxical effect of computerisation lowering recordkeeping standards lies in the decentralisation of information storage. In the pre-computer workplace, corporate information was stored in paper files which were kept in a central registry and administered by a hierarchy of clerks, who had responsibility for the creation and naming of new files, specifying file keywords (metadata) and deciding how long they files were to be retained for.
The head of the hierarchy defined the taxonomy of corporate information. Files were delivered to and collected from people manually, with a check-in/check-out system to track responsibility. People using files would add new documents (or folios) to the files, or annotate existing ones. The integrity of files in government organisations is frequently enshrined in law: removal or defacement of a folio in an Australian Commonwealth Government file is a criminal offence. Non-file documents certainly existed, but the primacy of files as the definitive information repository meant that non-file documents were regarded as ephemeral and were not generally retained.
The centralisation of typing resources in the typing pool meant that the demarcation between file and non-file documents was very clear: any document relating to a corporate decision had to be typed, document drafts were clearly distinguishable from final copies and all documents had to be placed on a file.
As computers have become almost universally used for writing documents and electronic mail widely used for circulating and refining them, the responsibility for deciding which documents are corporate records and storing them appropriately has been devolved to document authors. Significant cost savings accrued from this as the central repository and its hierarchy of clerks could be eliminated or drastically reduced in size.
However, document creators are not necessarily aware which documents constitute records, and may not have the training, tools or time to perform records management. The 2012 ANAO Audit noted: “Staff often stored information in a variety of places, but did not have consistent rules about the records that needed to be created and where they would be captured.”
New forms of communication further tax the abilities of individuals to perform recordkeeping. The humble email, which has been used in many organisations for over 25 years, is frequently used to communicate organisational decisions and thus may constitute an organisational record, but its structure can be complex, with nested messages and attachments frequently present. The email subject rarely constitutes an adequate record title, but is often used as such.
Emails are difficult to transfer to other applications for long-term storage. SMS messages received on mobile devices may also constitute records but transferring these to any other storage device requires third-party software to be installed on the mobile device. Tweets and social media postings may also constitute records which require specialised skills to transfer to long-term storage.
The use of the modern computing devices provides access to enormously powerful applications for the creation, exchange and manipulation of electronic documents. Documents created by these applications are stored either in file system folder trees or in document libraries within electronic document and records management systems (EDRMSs).
Folder trees have been used for information management and storage for many decades as they provide a means of grouping together files and other folders similar to that provided by the paper files and folios. The major difference is that electronic folders may contain sub-folders to create a hierarchical tree structure which is frequently 20 or 30 levels deep. Access controls can be applied to give users personal and group storage areas and data can be easily backed up if the computer hosting the folder tree is always connected to a network.
The major limitation of file system folder trees for recordkeeping is the lack of version control. Documents can be changed without any record of who made the changes or when they were made and there is no distinction between modifying an existing document and creating a new one.
Local conventions for version control are frequently used, such as appending a sequence number, date or author to the file name, but these cannot be enforced over the large number of applications used in organisations. User identity is linked to ownership of an active account, so that when users leave an organisation, any files of folders which they own lose their ownership information. Users are also able to modify the folder structure in which they store their documents so the location in which documents may be stored is difficult to control. Search technology is helpful in dealing with these problems but is rarely deployed over shared storage due to expense and performance issues which arise in maintaining access control over search results.
Search results on document repositories are frequently much poorer than on the Internet as there is no hyperlink information to assist in ranking search results, leading to lack of use of search facilities even if they are available. A consequence of these limitations is that many different versions of the same document are found within file system folder trees.
Studies in widely different organisations have indicated that up to 40% of electronic documents created by desktop applications on file systems are different versions of the same document.
By providing access to documents only through a database, EDRMSs can address these issues and have become much more widely deployed in recent years. They offer access to documents via a web browser rather than a file browser, making them much more suited to use from mobile devices. Microsoft Office applications such as Word and Excel can be configured to save and open documents from such systems by default and prompt users for any additional metadata. Interfaces are available for many EDRMs to use file browsers such as Windows Explorer so that users can continue to use a familiar interface for storage and browsing.
However, although EDRMSs can provide all of the functionality required for effective records management, organisations are reluctant to remove all access to shared file system storage as some applications require their data to be stored on a file system, and users are familiar with their operation. The performance of EDRMs tends to be poorer than shared file systems. Where both EDRMSs and shared file systems are both available for document storage, the EDRMS tends to become used for storing the organisational ‘good china’, containing clean, well-organised, but seldom used documents, with the shared file system being used for temporary storage before filing in the official recordkeeping system.
The 2012 ANAO report noted extensive use of shared file systems in the reviewed agencies and observed that, ”Significant delays in filing information to the official records management system expose records to alteration and deletion, ultimately impacting on the integrity and authenticity of the record.”
Governments have always recognised the significance of recordkeeping as a means of controlling their citizens, as well as delivering services to them. The filing cabinets and Hollerith punched card machines of Nazi-occupied Europe were tools for the subjugation of local populations and for the implementation of the Holocaust.
In Cambodia, the Khmer Rouge destroyed all government records in 1975 as part of their “Year Zero” program, on the basis that everything now belonged to the State. The operation of any legal system requires recordkeeping to record events and transactions, and in societies which use writing, this involves the creation and storage of physical records. In Tsarist Russia, one of the harshest punishments an individual could receive was 'legal death'. All the records documenting the victim's existence in law were destroyed. Such 'non-persons' could not travel, work, marry or own property. With no protection or recourse under law, they were vulnerable to robbery, assault, slavery, even murder, because such acts against non-persons were not crimes.
In more benign conditions, the exercise of government responsibilities requires recordkeeping over very long periods of time, sometimes in perpetuity. For example, the health records of Australian military personnel have to be retained for 75 years after their creation. The design of electronic systems to function over this period of time is a huge challenge. Whilst there has been some progress in making documents self-describing using Extensible Markup Language (XML), so that they can be decoded by future electronic systems, the lifetime of modern storage devices is measured in years rather than decades. The current approach to long-term preservation of digital documents is to keep them in an isolated digital repository and translate documents into newer formats as support for older ones disappears whilst keeping the original digital files for reference. Files are copied to new storage platforms as old ones become obsolete. This approach becomes more attractive as the cost of keeping paper-based archives increases and the cost of digital storage decreases , especially for documents originally created in digital form (born digital), but for very long term storage, the reliability of paper-based archiving is still attractive. With the high penetration of computers into the domestic environment, electronic storage of personal documents such as correspondence and financial data has become commonplace, and failures of domestic computers can cause considerable problems if data has not been adequately backed up. Whilst domestic recordkeeping does not present the same difficulties as organisational recordkeeping, the infrequent failures of modern home computers lead most domestic users to ignore the risk of data loss.
A 2014 survey by Kroll Ontrack, a provider of data recovery and ediscovery tools, found 36 percent of its Ontrack Data Recovery customers across North America, Europe and Asia Pacific experienced a personal data loss. Of these 35 percent did not have a backup solution at the time of loss.
Cloud storage of data for domestic users relieves users of the need to back up data on home computers, but adds other vulnerabilities, such as reduced privacy, reliance on a network connection to access any data and the possibility of their cloud provider going out of business. The rapid evolution of storage devices means that it may be difficult to read data from older devices, which were once commonplace, such as floppy disks or Zip drives. Changes in file formats used by common applications and in the applications themselves also cause problems in reading older data.
The grandchildren of today’s 70 year olds will have far more trouble looking at digital photographs of their grandparents in 70 years times than people now have in looking at paper photographs from the 1940s. The plethora of digital media files stored in most homes now is likely to be difficult or impossible to access in the future without application of the kind of systematic procedures used by archive organisations.
The advent of digital formats for books for delivery platforms such as the Amazon Kindle is likely to have similar consequences. If the experience with domestic backups is any guide, difficulty in accessing old data will be the norm rather than the exception in the future, as suppliers of the applications to read the data files will only maintain backwards compatibility as long as it is commercially viable. The use of public formats such as Adobe Portable Document Format (PDF) for text documents does not solve the problem, as extensions and variations are included in the many applications which read and write this format, resulting in difficulties in accessing many PDF documents. As a page description language, it is poorly suited to many information retrieval tasks.
The issue of management of electronic documents in by governments has been highlighted by the ongoing saga arising from the publication of 250,000 confidential US diplomatic cables by Wikileaks, an event widely known as Cablegate. In the pre-electronic era, the 250,000 cables would have existed only as a pallet-load of paper files, presenting a massive obstacle to their copying and distribution around the world. In addition, access to these documents would have been available to far fewer people than those who could access the leaked cables. The US Military classified intranet SIPRNet, on which the leaked cables were stored, has an estimated 4.2 million users, according to Wikipedia. Not all of these users would have had access to the cables, but potentially, access could have been granted to all of them. The ease of copying, distributing and searching these cables, together with the difficulty of managing access to electronic documents, makes Cablegate emblematic of the transformed environment created by movement of documents from paper to electronic form, where massive numbers of documents can be copied, distributed and searched with widely available computer systems.
The significance and moral status of Cablegate is energetically debated, but it is indisputable that it is the transformation of the information environment from paper to digital media which made it possible. Wikileaks could not operate in a non-electronic environment. The computerisation of modern society, whilst offering a level of access to information and ease of communication that was in the realm of science fiction 50 years ago, has created problems for recordkeeping in organisations through decentralisation of specialised functions. For individuals, problems arise through rapid change in storage hardware, data formats and applications to read data. Failure to recognise and deal with these could result in the present becoming unexpectedly inaccessible in the future.
Simon Kravis grew up and studied Physics in England. He remembers records marked Electrically Recorded and the British semiconductor industry. A keen reader of science fiction in the 1960s, he has seen some of its elements become reality but can recall none that anticipated the impact of information technology on modern life. After coming to Australia, he worked in the academic and public sectors in laser physics and geophysics before computers and seismic data processing came to dominate his working life in the 1980s, after which he worked on scientific visualization, parallel processing and developed software for drilling engineers. He discovered the anarchy of information storage and management after joining Intology in 2004. Since then he has worked on tools to deal with it with minimal user disruption at KAZ and Fujitsu before starting his own company, Aleka Consulting in 2013.