Touching The Void

Touching The Void

The sole reason for performing backups is as an insurance copy in case the primary version of data is somehow compromised. Netapp’s Mark Heers wonders why recovery is given less emphasis than recovery

Immortalised into a movie, “Touching the Void” describes Joe Simpson’s epic mountain climbing adventure, where he descends a previously unclimbed mountain in the Andes with a broken leg. One noteworthy point of this adventure is that the climb via a sheer ice wall to the peak went pretty much to plan, all the drama and issues occurred returning back down the mountain (despite using an easy route down). Simpson and his climbing partner planned for their ascent but did almost no planning for their descent.

In a very similar manner, this attitude is pervasive with backup and recovery. Sites plan carefully for backup, improving the time window and processes regularly. It is a process carried out virtually every day of the year. By contrast, only limited recovery planning and testing is performed. Yet the sole reason for performing backups is as an insurance copy in case the primary version of data is somehow compromised.

Backup is a controlled process which can be performed at a selected time, typically out of prime hours. For most sites, the fundamental approach is unchanged, writing either full copies of the data or the incrementally changed data to tape. To mitigate against the potential of tape failure, many sites create two tape backup copies. Studies have shown that the single largest cost element of backup is the price of tape media, though for many organisations, it is one of the most time-consuming tasks for system or storage administrators.

Again by contrast, recovery is typically time-critical. Until the primary copy is restored, a user, a team, an application or the entire business will wait for the recovery process to complete. As with the climbing analogy, it is apparent that planning for the return journey is at least as important.

Tape is no longer a suitable recovery mechanism for much of a company’s information. In many cases, with the cost of downtime being so high and the cost of disk having dramatically reduced (and approached the cost of tape), it makes eminent sense to make high-available disk the primary point of recovery. Disks also have built-in protection mechanisms (RAID) which ensures that a single disk failure does not impact the stored data.

According to a number of studies, the vast majority of restore requests occur within the first three days. A practical approach starts to appear – maintain a week or two of backup on disk to cover 90 to 99 percent of recovery requests and keep longer term data (which is far more unlikely to be restored) on tape. It is important to note that truly long-term retention of information is really an archive and not a backup. A small number of organisations have moved to the point where they no longer utilise tape for backup and recovery at all.

There are three main approaches for disk-based backup and restore. Where available on specific disk arrays, all three techniques can take advantage of the emergence of a second tier of disk drive. A number of disk arrays on the market support both standard fibre channel drives and slower but more economical ATA drives. On average, on a per megabyte basis, ATA drives are around 30 to 40 percent of the cost of fibre channel drives.

1. Modern disk arrays such as NetApp FAS systems support local replication functions where a copy of a data volume can be taken at a specific point-in-time without impacting the availability of that data to an application. There are two primary forms of replication - snapshots take logical copies using only a low space overhead (storing only the differences) and mirrors which take full physical copies of the data. With the limited overhead, snapshots can be taken every few hours ensuring recent backup copies of the data. Note that this technique does not protect against failures of the entire disk subsystem.

2. Supported by most enterprise backup software products including Veritas NetBackup, Tivoli Storage Manager (TSM) and CommVault QiNetix, Disk-to-Disk Backup allows the backup administrator to direct the backup image to disk rather than to tape. Feedback from various sites shows a 30 to 60 percent performance improvement using disk over an automated tape system for the backup process and an impressive 70 to 90 percent performance improvement when restoring.

3. Virtual Tape Libraries are designed to emulate tape libraries and tape drives. They provide the backup and recovery times associated with disk but with the backup server and software believing that it continues to write to tape. Typical tape benefits such as compression are also available on disk libraries.

Whichever technique appeals, making disk the primary recovery point for your data brings a number of immediate benefits:
• Significantly faster recovery;
• Improved data availability by reducing the backup time windows;
• Reduced requirements (both in number and capability) for tape media and tape drives;
• Reduced people costs as disk tend to be more automated.

Moving to disk-based backup and recovery has the potential to save 15 to 30 percent of your current backup costs, improve application availability and reduce the risk of data loss from tape issues.

Unlike Joe Simpson, in this 24x7 time-critical world, it is time to breach the void and plan for recovery (rather than plan for backup) and move to the point where nearly all recovery is performed off disk.

Comment on this story.

Business Solution: