An Australian Perspective on Avoiding Hidden E-Discovery Costs

by Shengshi Zhao of NuLegal

We are in the era of cloud computing, where the terabyte (TB) is gradually replacing the gigabyte (GB) as the standard data storage unit. Cloud technology is bringing big data into every part of our lives - from entertainment to corporate infrastructure, from evidence collection to discovery production. Naturally, this increase in digital footprints is causing the difficulty and costs of the e-discovery process to expand.

Unlike a growth in hard copy volume, when data volume increases, it attracts little visibility from the client, and therefore heightened costs are overlooked. Vendors and legal technology services (LTS) staff sometimes feel more comfortable sticking to traditional approaches that keep these hidden costs undiscovered, while litigators rarely understand the process well enough to implement a cost-saving strategy. Clients are left to cover the e-discovery bill in frustration.

Traditional approaches are being challenged. In this digital age, it is time to adapt to a new cost-saving workflow. We in LTS must look at the methods we have been relying on for years and ask ourselves a question: Is this the best way, or are we guilty of incurring hidden costs?

Hidden Costs in Early Case Assessment (ECA)

The classic E-Discovery Reference Model includes data identification, collection, processing, review, analysis and production. In the cloud era, data volumes are forcing us to reconsider this classic approach. Imagine that a client shows up with one TB of data for discovery. That means you will deal with approximately 10 million documents. In a case like this, extensive analysis needs to happen before review begins, not after, with some analysis happening even before processing so filters can be applied.

It might seem daunting to trade valuable (and billable!) review time for additional analysis, but every minute we spend on preliminary analysis could saves hours or days of document review work later –– a savings that ultimately goes to clients. A combination of date and keyword searches, content extraction analysis, near deduplication, email threading and social network analysis are all available to help us locate documents for priority review.

I once worked on an investigatory matter with two terabytes of data (about 20 million documents) collected in the initial stages. Using a combined analysis of date, keyword, social network and email threading, we found fewer than 50,000 key documents to be reviewed. This first round of review built up the framework of legal arguments for the team without us having to look through all 20 million documents.

Legal teams frequently ask whether they might spend time and resources on ECA and still not locate key documents or whether the LTS team might miss key documents during analysis. It is this uneasiness that can lead to the incursion of hidden costs during the ECA process, but this fear can be allayed with the involvement of the lawyers. In fact, lawyers’ involvement is crucial to identifying relevant documents efficiently. The best approach can be designed only when the legal team works closely with the LTS team so available tools can be tailored to the matter at hand.

Lawyers sit with clients and receive correspondences that LTS staff do not see. If the lawyers are not involved in the ECA process, LTS might spend days doing required searches and analysis without knowing key information was already received. If a person’s name or company number pops up, it might immediately ring a bell for the lawyers but not necessarily for the LTS member.

In the traditional model, legal teams draft a search plan that LTS staff translate into search queries. Once the legal team gets the results, they modify the search plan and send it to LTS for search iteration two. And iteration three, etc. The first few rounds always need to be further refined because no one can draft the perfect search plan without knowing how the data will respond to the searches. This back and forth between the legal team and LTS can take days.

It would be much more efficient to have a litigator sit down with an LTS member at the ECA stage and run a few searches to see the results. Once everyone has a better understanding of the data, they will be closer to creating the perfect search plan, and hidden costs will be avoided.

Hidden Costs in Processing

Where the legal profession used to be obsessed with paper, now it is all about the PDF. Because of the PDF being everyone’s favorite file type, e-discovery costs have grown exponentially as millions of email messages, family holiday photos and duplicated documents are being rendered into PDF, with optical character recognition (OCR) and pagination applied.

Lawyers like the idea of rendering everything into PDF so document reviews will go smoothly, and no more costs will be incurred. But people forget two things:

  • While everything is getting fully processed, i.e., rendered into PDF files and paginated, searches and other kinds of analyses are still being conducted in the review platform. Only a small set of documents will be reviewed, and an even smaller set will actually require the proper rendered format for document production.
  • Most law firms today have a perfectly competent document review platform that provides in-browser display of common file types in native format without the need to open native applications or slow down the document review process.

While rendered PDFs are preferred by many because all documents are readily available to produce, such peace of mind is expensive.

We recently had a matter with an accumulated one TB of data, and the lawyers wanted to review everything in the review platform. We produced just under 11 million documents in their fully rendered PDF form and later learned that only 500,000 would be considered for review after keyword searches –– still a large number, but only five percent of the original data. Even fewer documents, possibly less than one percent of the original data, will be produced. All that rendering cost could have been saved for the client.

If clients knew of these options and made informed decisions, it would be another story; but I doubt many lawyers or vendors go into the processing details to show their clients how much they are spending. Native processing and rendered PDF processing costs can differ by around $200 per GB. For one TB of data, the cost difference can be between $80,000 and $200,000.

Lawyers no longer must choose between PDF files and waiting for an Office application to open every single native file during document review. Modern review platforms such as Everlaw have invested a lot to support native files, especially spreadsheets, without the need to install any third-party software. In Everlaw, for instance, the spreadsheet formatting is well-kept and hidden rows and columns are displayed.

Reviewers can even see the formula used for each cell. When native file formats can be displayed neatly in the review platform, it is a waste not to use the function. Other features such as audio/video in-browser display and in-line translation are also provided by good document review platforms to streamline the native review process.

The hidden costs of the e-discovery process can grow with few people noticing, but a good review platform immediately drives down processing costs and time. We have to question: Are we using the right tool for the job? If your document review platform requires extra rendering at the client’s expense, it could be time for an upgrade. You might find that the cost of switching platforms can be recuperated on one large matter in native review.

Hidden Costs in Document Review

Legal teams often want to know why they keep seeing the same document when the review sets have already been deduplicated. Deduplication by MD5, a message-digest algorithm, is an accurate way of capturing duplicates, but there is still plenty of room for them to crop up in review:

  • MD5 duplication only occurs for top-level documents. If an attachment is attached to multiple emails, it will not be de-duplicated.
  • Sometimes a server or network delay can cause a millisecond or two of time difference between an email retained on the sender’s end and that on the receiver’s end. When the time stamp is different, MD5 values will be different, and the two email messages will not be deduplicated.
  • While end-point email messages in a thread often contain the entire chain of previous emails, each of the individual messages in the thread is different and will not be captured by deduplication.

Duplicates and Email Threading

The LTS team has traditionally relied on ECA tools for near-duplication or email threading analysis and used fields and/or tags to make sure all the information has been captured. Due to the complexity of the task, many people would rather review documents in chronological order and deal with duplicates in the review platform rather than work out the logic of which field/tag means what. But when you have more than one person on a review team, this on-the-fly approach does not ensure review consistency, which adds more work for secondary reviewers, i.e., senior lawyers.

The good news is review platforms today often come with integrated duplication and email threading analysis functions that help identify documents with over 95 percent similar content. Colour codes provide additional details about how similar the documents are. It immediately becomes obvious for the reviewer that the duplicates identified by the review platform should be coded consistently. The same logic goes for email threading analysis.

Predictive Coding

Some people embrace the concept of predictive coding, while others remain skeptical. Predictive coding is a machine learning process. Artificial intelligence has challenged the best human chess master and best human driver, and there is no reason it cannot outsmart the best human document reviewer. We use predictive coding for review prioritization. The model can be used to identify documents most likely to be relevant and put them as the first few review assignments.

In a recent matter, after a small review of 3,000 documents identified by keyword searching, the predictive model identified 1,000 documents with a predictive relevance over 80 percent. By searching all the unreviewed documents among that 1,000, we identified 524 documents not yet reviewed by the legal team, who was very interested in our finding. They reviewed those 524 documents and determined they were highly relevant to the matter; in fact, the team used some of them in the mediation process.

Reviewers can change the predicted rating to over 90 percent if they want a higher certainty or under 70 percent if they want to include a wider range of documents. The predictive model might help you pick up documents non-responsive to keyword searches and save you hours on consistency checks.

Integrated Outline/Statement and Chronology Function

There was a time when the LTS team would spend hours on a hyperlinked chronology set or witness statement bundle. Fortunately, some review platforms today recognize the need to integrate these commonly used functions into the platform and have designed separate modules for outlines and chronologies where lawyers can insert document IDs directly from searches or binders. Gone are the days when a legal team had to wait days for a hyperlinked set and LTS staff had to make dozens of inquiries about a typo in a document ID.

With all the tools available in the modern review platform, reviewers should not have to spend hours or days on consistency and quality checks. Three years ago, we engaged an LTS paralegal on a matter to do nothing but track down duplicates in one hyperlinking set. Costs of this kind can be completely avoided by using a good review platform.

Uncovering Costs

Today’s technology brings us amazing tools to save us days and weeks of time, and law firms will continue to generate hidden costs if we do not better utilize these technologies. Times have changed, and we must uncover costs that can be turned into client savings.

A Case Study on Unrecoverable Costs

In Auckland Waterfront Development Agency Limited v Mobil Oil New Zealand Limited, Mobil sought to recover electronic discovery costs incurred with two external providers: $56,858.35 with Provider A and $93,557.92 with Provider B. The High Court of New Zealand found Provider B’s costs of $93,557.92 to be fully recoverable and Provider A’s costs to be 50 percent recoverable, and subsequently ordered AWD to pay Mobil $121,987.10 for electronic discovery processing costs.

Of the seven million documents collected in this matter, 158,000 were fully processed, rendered and imported into Relativity. Nineteen thousand were ultimately discovered, which is only 0.02 percent of the initial data and 12 percent of the processed data. With better early case assessment, 88 percent of the costs for rendering those documents might have been saved for both parties.

This case also reveals that courts have no effective way of assessing the reasonableness of e-discovery costs. The High Court determined costs incurred with Provider B to be reasonable because Provider B “appears to have been able to provide a ‘discovery database’ of the kind that has been made necessary by the new electronic discovery rules.” The Court took a broad view without drilling into the billing details to determine what costs could have been saved or what processing tasks were redundant. ILTA

Shengshi Zhao is a Senior Consultant at NuLegal (Sydney, LTS consulting company). She has an actuarial science degree from Worcester Polytechnic Institute and obtained her J.D. from the University of New South Wales. Prior to beginning a career in the legal technology industry, Shengshi worked as an actuarial software engineer at Fidelity Investments. She also worked in legal project management at Allens where she developed specialized knowledge in dealing with litigation processes, e-discovery, competition matters and government regulatory notices. Contact Shengshi at © ILTA 2016