The difference between truth and ediscovery

By Tim Williams

Startups are addictive. When you set a milestone and then meet it, it's like a rush of adrenaline. I'm on my third startup, a company focused on enterprise-class content indexing, and I still remember clearly the excitement I felt years ago when we first started receiving an influx of calls from law firms after announcing our ability to directly index backup tapes.

As any IT professional knows instinctively, there is no better place to find responsive data to past events than backups. They are read-only copies of data made at the time of the event and stored on backup tapes: complete, timely, and unspoiled. Up until our announcement, we thought the only remaining problem was the cost. Pulling that data off of legacy tapes was so complicated, expensive and service intensive, that it was easy to exclude it with a burden argument.

Our software fully automated indexing the data on tapes so they could be searched directly, and only the responsive data restored. And since it was all done with software, we could afford to sell it for orders of magnitude less than the then-current service-intensive alternatives.

So the phone would ring, someone from a law firm would ask the price, we would proudly announce it, and far more often than not ... we would never hear from them again. We were puzzled. We thought price mattered. Why weren't we winning? How we could have gotten it so wrong?

It turns out, the prospects were calling us and our competitors to find out who had the highest price, not the lowest price. They didn't want to find the data, they wanted to build a burden argument to prove it was too expensive to find. So they threw out our number and went with the highest they could find.

We learned the hard way the difference between truth and ediscovery ... it's the difference between the value of the former and the price of the latter. In this case the value was zero, so no price could justify ediscovery.

Value of backup data

I was reminded of this bit of personal history when I noticed that Craig Ball recently released an updated version of his now-titled backup format magnum opus,  Luddite Lawyer’s Guide to Computer Backup Systems ( Craig is both a lawyer and a technologist. In that post, he elegantly summarizes the value of backup data as the best source for accurately identifying legally responsive data:

"Ideally, the contents of a backup system would be entirely cumulative of the active online data on the servers, workstations and laptops that make up a network.  But because businesses entrust the power to alter and destroy data to every computer user - including those motivated to make evidence disappear - and because companies configure systems to purge electronically stored information as part of records retention programs, backup tapes may prove to be the only source of evidence beyond the reach of those who've failed to preserve evidence and who have an incentive to destroy or fabricate it."

Ball also recognizes that when the price of ediscovery is high, backup tapes can be used to cover the truth rather than reveal it:

"Backup tapes can also be fodder for pointless fishing expeditions mounted without regard for the cost and burden of turning to backup media, or targeted prematurely in discovery, before more accessible data sources have been exhausted."

It's a compelling piece, a deep dive into everything a lawyer should know about backup. I recommend you read it all.

I'd like to think that the one fundamental flaw with his update is that it doesn't account for recent market changes, changes that my company has helped accelerate, that have dramatically lowered the price of backup tape based discovery. But the truth is, even if the cost to process data on backup tapes is equal to the cost to process data on the network, there are other factors that can lend credence to backup tape burden arguments:

  • If your backup tape catalogue isn't up to date, or if your search terms are broadly defined, it may be hard to find which backup tapes contain the responsive data.
  • It's likely that your backup tapes are stored off site, and there is cost and effort in bringing them back.
  • Indexing backup tapes is a more arcane process than indexing network data, so you have to search for the right experts to do that.
  • Backup tapes are far more redundant that network data, so the amount of source data needing processing to arrive at the same amount of responsive data can be far greater than network data.

Let me summarize: backup tapes are a pain in the neck. Eliminate the processing cost disadvantage, and the burden of the handling the media still remains. Ball agrees:

"But, as theory and practice are rarely on speaking terms, companies may keep backup tapes long past (sometimes years past) their usefulness for disaster recovery and often beyond the IT department’s ability to access tapes created with obsolete software or hardware.  These legacy tapes are business records—sometimes the last surviving copy—but are afforded little in the way of records management.  Even businesses that overwrite tapes every two weeks replace their tape sets from time to time as faster, bigger options hit the market.  The old tapes are frequently set aside and forgotten in off-site storage or a box in the corner of the computer room."

So what's the solution? Ignore the truth...that backup data is more accurate, more complete, and completely unspoiled when compared to network data...and continue arguing burden? That's not working anymore. More and more, courts are recognising the value of backup data and the plunging cost of processing it, and are ordering that backup data be discoverable.

The solution is to take the backup tape out of backup data. Preserve backup data in an on-disk, compact and easy to search archival format, and the cost/value equation tips dramatically in favour of backup vs. network data.

As Ball notes in the quote above, after about 30 days, backup data loses its value for disaster recovery and becomes archival in nature. It's possible to avoid the burden of backup tape completely by backing up to disk, and by migrating that data to an archival format at the end of the disaster recovery window.

Products that enable that kind of migration easily pay for themselves, both in operational saving, as well as the near total elimination of eDiscovery costs. The past becomes immediately searchable. Early case assessment is available proactively. Simply search, cull, and review. No need for outside eDiscovery consultants, no network searches, no backup tapes.

Tim Williams is CEO of Index Engines, a provider of enterprise information management and archiving solutions. ​