A Backup is not an Archive …

By: Bill Tolson

I first wrote about the differences between backups and archives almost 20 years ago. At the time, organizations were struggling to get to grips with the requirements to retain electronic records and the new-fangled world of e-discovery. So why are we still talking about this? And, more importantly, what’s changed? One word – cloud! And the right to be forgotten – more about that later.

With the continuing flood of data piling up across organizations around the world, many are turning to the cloud as part of a long-term storage strategy to keep pace with the vast amounts of data they must store, manage, and share.

Most of the organizations that I work with are archiving inactive or little accessed data to cloud archives for regulatory and legal compliance reasons, while keeping active data on-premises for daily use. No surprises there. However, they’re are also utilizing cloud storage for the storage of system backups - instead of storing and managing months or years’ worth of backup tapes in third-party warehouses.

The right to be forgotten

Backup strategies have become a much more complicated thanks to the proliferation of new privacy regulations: both the GDPR and CCPA privacy regulations include a “right to be forgotten” (right to erasure) provision.

The right to be forgotten is a theory that was first put into practice in the European Union (EU) in May 2018 with the General Data Privacy Regulation (GDPR). The following year, the State of California included the right to be forgotten in their California Consumer Privacy Act (CCPA) regulation, which became active on January 1, 2020.

The right to be forgotten specifies that companies collecting, selling, and holding personal information (PI) on EU or California citizens must find, report on, and delete (when asked) all PI that can be used to identify the data-subject - if deleting the PI is not prohibited by regulatory or legal responsibilities.

These laws target PI in email systems, in marketing and sales CRM systems, on SharePoint servers, on employee desktops/laptops, corporate social media accounts, cloud repositories…anywhere.

Companies face two main issues dealing with the right to be forgotten:

  1. First, while data-subjects have the right to demand their personal information (PI) be completely deleted, can the company actually find all target personal information for a given data-subject in the allotted time?
  2. Second, if the data subject's PI is also stored on backup tapes, is there a legal expectation to search all backup tapes that could contain the PI, find it and erase it – again in the allotted time?

The GDPR authority has not directly addressed this question and a legal precedent has not yet been established in the EU courts. Interestingly, in one of the last amendments to the CCPA, the State’s Attorney General’s office included the following addition to the regulation to address the backup question:

"[i]f a business stores any personal information on archived or backup systems, it may delay compliance with the consumer’s request to delete, with respect to data stored on the archive or backup system, until the archived or backup system is next accessed or used."

The phrase “… until the archive or backup system is next accessed or used" does nothing to clarify the issue and has not provided any further guidance with regard to determining when an archive or back-up system is accessed or used.

This raises the possibility that any data newly stored in an archiving or backup system could create a duty to erase PI immediately upon the next use. This is problematic when you consider that archiving systems are accessed many times a day while backups are accessed at least nightly. So, this CCPA amendment did nothing to offer guidance to companies on this question.

One prevailing opinion is the following: archiving systems are designed to allow file-level search and actions quickly so the deletion of PI in an archive should be almost immediate, while the CCPA amendment addressing PI deletion on backup tapes could be read in two different ways.

Does the amendment assume PI deletion the next time the backup system is used – which will be daily for most companies? Or does it mean that the next time a specific backup tape is used for a restoration? 

Gartner Research looks at the question of deleting PI from archives and backups by dividing it up into proactive remediation and retroactive remediation stages. Active and archive record with PI should be deleted immediately while backups should be handled in a retroactive manner i.e., as backup tapes with specific PI are used for data restoration, only then is the requested PI removed from the tape.

This retroactive PI deletion process does not seem to meet the true intent of the right to be forgotten however it does take into account the actual technical issues with backup tape PI deletion. Again, the various regulatory authorities will need to address this soon.

A second issue revolves around how to delete a specific piece of PI from all backup tapes on which it is potentially stored. To address this ongoing question, let’s go back to basics and review what a backup and archive actually are.

What is data backup vs an archive?

Historically, organizations have treated backup and archiving as separate processes. The backup process was originally created for disaster recovery.

Backing up is the process of making a copy of operating systems and data resident on servers and storage repositories for the purpose of restoring the entire system (OS and data) to the affected server in the event of system issues. For example, an email server becomes corrupted, and the server OS, email application, and message store needs to be restored as soon as possible.

The biggest problem with backups is data that can be lost between backup cycles (usually 24 hours). In the email server example, the email sent and received between backups is permanently lost when the email server is restored using the last backup data set– also referred to as the recovery point objective (RPO).

The backup is usually performed utilizing a backup application that creates its own custom-formatted data container – meaning it is very difficult to search for and act on specific files in a backup file. In reality, the backup must be fully restored to the server to search and act on specific files.

On the other hand, the archiving process stores a single copy of individual files for long-term storage and management for legal, regulatory, and business reasons. A key distinction here is that individually archived data, if stored in its native format, is easier to search for and act on.

Even today, some organizations continue to rely on backups as a substitute for low-cost archives. While the cost of backup storage has continued to fall, finding and restoring these individual files can be extremely slow and expensive.

For example, the estimated cost to restore, search, delete PI, and create a new backup tape can range between $US1,000 and $US3,000 per tape. Imagine how many of your organization’s backup tapes contain a particular data subject's PI…

Backup in the Age of Data Privacy Regulations

Every organization backs up their servers, data repositories, and other enterprise systems. All of these systems will include personal information that could be subject to the right to be forgotten. The main question you should ask yourself is this: does the right to be forgotten include PI on enterprise backups, and if yes, does your company have a tested and documented process to do it?

There continues to be a debate about the practicality of establishing a right to be forgotten (which amounts to an international human right) due in part to the breadth of the regulations, the potential costs to implement, and the many other issues left unaddressed.

However, both the GDPR and CCPA are now law, and specific questions will need to be addressed in the courts. Until these issues are clarified, companies will need to decide if they are willing to “roll the dice” and leave requested PI to remain on backups indefinitely.

So is the common practice of backing up servers and storage devices practical and still needed for disaster recovery? In limited circumstances, maybe.

The traditional practice of backing up everything and shipping tapes offsite is error-prone, cost-prohibitive, inefficient, and leads to over-retention and increasing eDiscovery risk. Nevertheless, companies still do it—and in many cases—they keep backup tapes for 10+ years.

Now with the introduction of privacy laws and the right to be forgotten, many are looking for new ways to protect their systems and data while also meeting the new privacy regulations.

Three potential strategies for data backup solutions

Expert opinion is all over the place, but eventually the GDPR and CCPA regulatory authorities will have to address this issue of PI on backup tapes directly. In the meantime, companies should consider three potential strategies;

  1. Ignore the issue until the regulatory agencies issue guidance while hoping your company does not receive a right to erasure request.
  2. Encrypt all PI with individual encryption keys so that when a right to erasure request is received, the encryption key for that specific data subject's PI can be deleted, making all of their PI unavailable forever, including PI on backup tapes – cryptographic erasure.
  3. Or, instead of backing up data that contains PI, archive it so that it can be easily managed, searched, and deleted when needed while creating a backup of the various server operating systems and system files so they can be restored when needed.

What’s really needed? A Cloud Archiving Solution

These days most organizations experience near-zero data loss because of natural disasters, equipment failures, etc. Due to the massive migration from on premises systems to the Microsoft Cloud and Office 365, including Exchange Online and OneDrive, much of the unstructured corporate data is already synchronized to the cloud, bypassing the need for backup.

For those enterprise systems currently not cloud-based and that is still being backed up - such as departmental file shares, the obvious strategy is to move towards a cloud-based archiving solution for all work data so that most backups can be discontinued and more importantly, so PI can be found and deleted when requested - quickly.

Archive2Azure Cloud Archiving

Archive2Azure, a managed cloud archiving solution from Archive360, provides organizations with long term archiving and information management of active, low-touch, and inactive files—all managed to granular retention/disposition policies and available at a moment’s notice.

Archive360’s Archive2Azure intelligent information management and archiving platform is designed specifically to effortlessly meet GDPR and CCPA data management and privacy requirements in a cost-effective manner. Archive2Azure takes full advantage of Azure Cloud security, geo-replication, DR, ML/AI, and Azure’s three storage tiers; Hot, Cool, and Archive.

The Archive2Azure platform provides companies more control of their information management and compliance responsibilities, including responding to GDPR and CCPA data deletion requests – quickly while enabling companies to move away from expensive and risky on-premise data management and backup solutions and instead utilize their own Azure Cloud tenancy. By utilizing Archive2Azure, companies retain direct ownership of their data - something the proprietary “one size fits all” third-party SaaS cloud archives cannot do.

Bill Tolson is the Vice President of Global Compliance for Archive360. This article originally appeared at https://www.archive360.com/blog/a-backup-is-not-an-archive-but-a-cloud-archive-can-be-an-effective-backup-revisited