Age-defying Storage

Age-defying Storage

By Laurie Varendorff

The constantly evolving technology industry poses problems for those public and private sector organisations that are required to store their data for the long haul, primarily the need to transfer data onto new media and replace hardware every few years to avoid technology obsolescence. But those problems may be about to be consigned into the past, as Laurie Varendorff first examines the problem and then reports on a new evolutionary digital data storage process that could potentially preserve data for up to 500 years

Historically, and even today, there is little in the way of digital recording media which will stand the test of time for archiving.

The term archiving has many variant meanings to different sectors involved with data storage. For the IT industry, the general understanding of archiving is that archiving is the intelligent back-up of selected objects (datasets, items etc.) that no longer need to be accessed on a regular basis within the computing system. These files are removed from the online disk storage to a lower cost media such as tape or optical disk.

To the archiving profession, archiving could mean maintaining data FOREVER or for Long Term (LE-Life Expectancy) preservation, be that 100, 200, 300, 400, 500 or 1,000 or more years.

The IT industry has had a major issue in coming to terms with this definition of archiving and the technology, and to date the IT industry and profession has not had a good track record for addressing these long term preservation requirements.

How is it possible to combine these two competing archival processes to accommodate for the genuine requirement to maintain data created electronically for the ongoing business process, and then for the long term legal, moral and historical requirements of society, government and history?

Migration and emulation have been the process most commonly identified and used by the IT profession to date to address long term preservation requirements.

Why? Because most magnetic and other types of data storage devices have come and gone (not to mention the software obsolescence issues), leaving us without the ability to even try to retrieve our data stores, be they held on 8, 5.25 or 3.5 inch floppy media, hard drives and or magnetic tape plus optical discs at 5.25, 12 and 14 inch formats, removable or non-removal rigid disk packs from a multitude of data storage suppliers, and last but not least, magnetic tape, be it half-inch 9 track way back in computing time, the recent versions of today's Digital Linear Tape DLT, or various versions capable of holding Petabytes of data in automated devices.

CD and then DVD were touted as the media of choice for long term storage of lower volume digital data but they too have fallen from grace due to their identified inherent short term deterioration characteristics.

Where are we today?

Digital information has a records retention requirement as defined by government, regulatory agencies and specific regulatory organisations. The reasons we keep digital information are for the same reasons we keep paper files, and now, digitally born files.

Legal requirements: These have expanded greatly in the past few years with the addition of new laws and regulations, especially in the USA e.g. the Sarbanes-Oxley Act of 2002, Health Insurance Portability and Accountability Act of 1996 - HIPPA, U.S. Securities and Exchange Commission-SEC, etc.-and relevant legislation in the Australian region and the enforcement of numerous others.

Litigation defence: The need to defend the company or to protect its intellectual property is increasing each year. Trustworthy and complete documentation is needed or the defence is vulnerable and losses could be in the millions of dollars.

Corporate Governance: Additional requirements beyond the above may be dictated by your company, and mostly driven by self-protection.

Accountability: Certainly the Federal, State and Local Governments require the long-term retention of numerous document types: personnel records, land titles, and Occupational Safety & Health OS&H to name a few.

Societal and Historical: This is wide ranging from historical files and religious record to today's digitally born art and music.

Personal Values: The family album is undergoing its greatest transition today. Preservation is important or how will your great grandchildren be able to view the pictures of you and your children taken today?

Backup in today's technical environment

Backup is used for quick recovery from spinning media, near-line media of fixed content. Retention is defined by the laws, regulations, common practices, or by the owner of the information or IT policies and practices.

Preservation for keeping files over an extended period of time requires the highest level of certainty and accuracy as the files may not be used or viewed frequently enough to ensure no inaccuracies are evident. These files need to have the added assurance of being unalterable. As the time period from creation to discovery of an error increases, the lower the probability of being able to detect and correct it or numerous errors in the applicable files, while still retaining their trustworthiness.

The risks

The risks for preservation are singularly influenced by the long time horizon associated with the retention of the files. Preservation over short timeframes provides many options for retention to overcome issues. It is not so easy when preservation requirements are in the 30+ year horizon. If we extend the time period for the retention of digital data out to 100 or 500 or more years, digital data has a permanence issue of major proportions.

Technology obsolescence is the most widely discussed and still has nearly all the same attributes of the past several decades. The major gains have been the reduced cost of the storage media and the storage capacity per unit. However, the Life Expectancy (LE) of the magnetic tape tops out at 30 years. How many organisations are really planning on that horizon? The practical limit for the maintenance of digital data is still three to five years. From a technological obsolescence challenge, there still remains the same major issues that existed since the introduction of computing to the masses in the 1980 and 1990's and before that for large mainframe installation-that of frequent advancements, new elements introduced to the technology chain, and product discontinuances. Hardware is still highly proprietary, and for economic factors, it will remain so in the future. Vendors also come and go.

Software has a similar problem, and many new programs enter the user market each day. Formats for files in applications continue to advance, often with little regard for their predecessors.

Media stability is the most risky - look at the warranty of the media, and if it fails, the manufacturer will give you a new one, absent your precious information of course.

Managerial risk

Managerial obsolescence has been with us all along, yet only recently has it been elevated for scrutiny and is now being discussed at the executive level.

Essentially, management changes about the same as the technology. In hindsight, many digital losses could have been preventable had management been better. In today's business world, there are frequent changes in the management ranks, and continuous pressure on budgets. Will the company make the financial commitment to keep the files viable as they age?

As management changes, do some things drop between the cracks on policy for refresh, emulation or migration? Think about the files you left behind the last time you upgraded your PC. Are resources consumed for more pressing business needs than preservation?

If I as a manager do the things I should to preserve this year, I will miss my budget and my compensation is reduced. And maybe I will have a new job next year, or who would know if I wait until next year? A missed cycle for the retention of digital information could be unrecoverable later. Thus, management risk is significant.

The solution?

The solution, at least, to the technological issues identified, as we will leave the managerial risk and obsolescence for others to address.

A recent paper presented at the Society for Imaging Science and Technology (IS&T) Archiving Conference 2005 in Washington DC titled 'Ending Digital Obsolescence', plus a second paper delivered at the Association for Information and Image Management International-AIIM 2005 Conference and Exposition, titled 'Datasurance-Patent Pending-Preservation Archive for Digital Files', may provide the answer to this long term process and media issue.

The papers were presented by Ken Quick and Mike Maxwell of US technology firm Affiliated Computer Services (ACS), creators of the Datasurance product.

What is Datasurance?

The Datasurance product offers to maintain digital data in any format, be it music or voicemail, X-rays, MRI's, emails, databases, applications, Operating Systems, charts and Excel spreadsheets, Word documents, PowerPoint presentations, .TIF and other image files plus black and white and colour digital images and videos onto a long term, fail safe copy of last resort media-microfilm.

How does the process work? Well, ACS gave us a glimpse at how they create this miracle by the application of 2-D barcode technology and the Datasurance product media (black & white) microfilm. ACS refers to the output media as analogue/digital tape.

There are thousands of different file types and formats, with more coming each year. Yet there is one thing they have in common. At the base level, they are a sequence of 0's and 1's that the program transforms into a colour spot, a program, a sound, a character, and a command. Files are written to media and transmitted over networks as binary information (0's and 1's). Whether stored on disk, tape or optical media, the files are pulses or spots of 0's or 1's. This attribute becomes the key to the Datasurance preservation concept.

This 2-D barcode has built-in error correction code and two different cyclic redundancy codes to assure the information in the 2-D barcode can be extracted even if there is significant damage to the 2-D barcode, to assure the information is read as written. The error correction code assures accuracy even if 25-40 percent of the 2-D barcode is unreadable. The correct data can still be rendered.

Datasurance uses this format to store the 0's and 1's as a 2-D barcode. The process creates as many 2-D barcodes as needed to represent a file. Each is sequentially encoded for proper decode and re-assembly. Now any file can be represented as a series of 2-D barcode 'pictures.'

Very simply, the Datasurance process takes the sequence of 0's and 1's in the file and converts them into a sequence of 2-D Data Matrix barcodes-as many as needed based on the size of the file. For example, a PowerPoint presentation that includes colour, text, sound, video, spreadsheet and animation, is still at the base level.

The process assembles the 2-D barcodes into groups, and prints them to film. Each 2-D barcode is sequentially numbered to assure its correct place in the writing. There are several writers available today. The 2-D barcodes are printed on silver halide microfilm 16 or 35 mm and processed to AIIM/ANSI standards for archival storage for Long Term LE >100 years.

How is data interpreted or retrieved?

The process for creating a file from the 2-D barcode is accomplished by scanning the 2-D barcode and decoding it to get the 0's and 1's sequence. The process then converts the 0's and 1's into the appropriate file. The resultant file will be an exact copy of the original file that was used for input. This is what happens when a file comes over the Internet to your computer-a series of 0's and 1's-then a program on your computer converts the series of 0's and 1's to the file that is a picture, or a message or a webpage, etc.

Because of the error correction code included in the 2-D barcode, the copy file is identical to the original. The process for creating a file from the 2-D barcode is accomplished by scanning the 2-D barcode and decoding to get the 0's and 1's sequence. The process then converts the 0's and 1's into the appropriate file.

To summarise, this process can be used to encode any digital file to be stored in this form. One process is universal for the thousands of file formats, programs and operating systems. Now everything digital can be preserved with this one approach.

Does this mean that microfilm has found a new life after numerous years of decline in volume due to increasing digital storage capacities in ever decreasing physical size of storage devices? Only time will tell if the Datasurance product cuts the mustard and becomes a winning and commercially viable product for the trustworthy retention of all things digital, to the delight of archivists, records and information managers, preservationist and historians.

We have seen the holy grail of long term digital data storage trumpeted on a number of occasions so we will wait to see if this becomes the solution that helps records managers sleep easy each night, safe in the knowledge that their data is safe now and into the future long after they retire.

Related Article:

Optware looks to standardise holographic storage