Darwinism comes to ediscovery

BY SCOTT GILLARD


At  the beginning of this year, I made the usual pilgrimage to New York for LegalTech 2009. Whilst I was there, someone made the comment to me that “Darwinism has come to ediscovery”. This comment intrigued me and it prompted me to think about how it may relate to ediscovery in Australia.


I started by looking at a definition of Darwinism which states that “Darwinism is a term used for various movements or concepts related to ideas of transmutation of species or evolution”.


Essentially, only the fittest and smartest survive. So how does that translate into the world of ediscovery and more importantly is this statement valid?


If you look at the Australian ediscovery landscape now in contrast to what it looked like 10 years ago, it is very different if not unrecognisable. In the year 1999 whilst we were worried about the impending doom of Y2K and preparing for the Olympic Games in Sydney,  ediscovery sneaked into the Australian legal market almost unnoticed.


The methodology used in those days was simple. Rather than taking a large volume of electronic documentation and printing it out to paper for review and then perhaps scanning that paper back into electronic format as a TIFF file for review in a litigation support database, you simply printed the electronic documents directly to TIFF. This immediately reduced cost, time and effort in achieving the same result in paper form.


At this point in time, the concept of de-duplication was foreign and people thought that an MD5 Hash was something that you had for breakfast with your eggs and bacon. The first search engine was only nine years old so keyword searching was still a novelty, Google was just celebrating its first birthday and if you mentioned the word clustering in the same sentence as litigation review, people probably thought you were an astronomer not a litigator.


Fast forward to today. MD5 Hash is a regular part of a lawyer’s vocabulary and is distinguishable between a mashed up, fried potato and an industry standard for de-duplication. Google is now defined in the English dictionary and keyword searching is a part of everyday life. An alternative search method such as concept searching continues to seek acceptance in the market and clustering is associated with document review rather than just astronomy.


Technology and computers now play a part in every aspect of an individual’s daily routine. Mobile phones, ATMs, PDAs, laptops, PCs, security systems, transportation, iPods, search engines. The list goes on. However the biggest difference between 1999 and 2009 is education. People now understand more about technology and the perceived “black art” of ediscovery and are demanding more “bang for their buck”.


Gone are the days of simply printing “everything” to TIFF and then letting a team of lawyers loose on the results. Most people now understand that this is not a viable solution and that there are more cost effective ways to manage electronic material. Not only are lawyers expecting more of service providers and in-house litigation support departments but the clients themselves are expecting more for less.


The most noticeable trend at industry trade shows in the past 12 months that focus on ediscovery is the increase in the number corporate in-house counsel, CEOs and CFOs who are starting to attend these events. ediscovery is no longer something just for lawyers and internal litigation support departments. Customers and clients alike recognise now more than ever that ediscovery is something that they need to understand.


Historically,  due     to  the market’s lack of education around ediscovery processes, reliability has been the most important requirement. As long as the results were consistent and reliable then this was sufficient. However, clients are now demanding more than just reliability. They want it faster and cheaper with greater levels of added value than ever before.


So how do you achieve this? In regards to preparing documents for review, you can only go as fast as the limit of the available technology from both a hardware and software perspective.


Yes, you can throw more and more money at systems, building server farms to rival some of the Silicon Valley giants and masses of staff to run the equipment but with size comes enormous overheads which even outside of a GFC is not feasible for most. The key is to find the fine balance between automation and innovation.


Pick the battles you can win versus those you cannot. Those who can develop innovative workflows to reduce cost coupled with value added services will ultimately thrive and survive.


Many moons ago the 2006 Forrester report outlined that whilst processing documents for review was the largest perceived cost, legal document review was the largest addressable cost in electronic litigation.


Historically the market has focused on developing software tools and workflows which focus on streamlining the processing of documents, not the review of documents. So if the largest addressable cost is in the review of documents not in the processing of documents, what can be done to address the cost of document review?


Equivio, who are well known in the marketplace for their near-duplicate detection and email threading software, have developed an application which enables automated prioritisation of documents and keywords called Equivio>Relevance. CCH Workflow Solutions has bundled this technology into a custom service we call “CCH Reveal”.


CCH Reveal is a new litigation support workflow using Equivio>Relevance which helps to streamline the review process by introducing a 'Graduated Scale' for reviewing documents rather than just sorting documents into 'Relevant' or 'Non Relevant' groups.


This new technology goes beyond traditional keyword searching and helps to reduce review times by allocating the most relevant documents to your most expert reviewers. The technology enables automated prioritisation of documents and keywords.


As an expert guided system, CCH Reveal works like this: an expert reviews a sample of documents, ranking them as relevant or not; based on the results. CCH Reveal “learns” how to score documents for relevance.


In an iterative, self-correcting process, CCH Reveal feeds additional samples to the expert. These statistically generated samples allow CCH Reveal to progressively improve the accuracy of its relevance scoring. Once a threshold level of accuracy is achieved, CCH Reveal ranks the entire collection, calculating a graduated relevance score for each document. 


The benefits of such a workflow are:


Early case assessment



  •     *  Document priorities facilitate targeted, early review;

  •     *  Automatically generated keywords provide a ‘bird’s eye view’ of the collection;

  •   *  Estimates of collection richness enable more efficient budgeting of the review effort.

Smarter culling



  •     *  Document priorities enable alignment of culling target with budget constraints and changes, and evolution of case issues and ingest collection size;

  •   * Statistical tools manage trade-offs between over-inclusion (precision) and under-inclusion (recall);

  •  * Prioritisation of documents supports multiple cut off techniques, such as budget or precision/recall targets;

  •   * Automatically generated keywords can be used to enrich manual keyword lists.

Enhanced review


In selecting the review set, CCH Reveal’s precision and recall rates enable:



  •     *  Review of fewer documents - lowering the review cost ;

  •   * Review of more relevant documents, reducing the risk of missing key data.

Within the review set, relevance rankings enable:



  •     * Review by priority, focusing on the most relevant documents early in the review process;

  •    *  Assignment by priority e.g. assign priority documents to senior or expert reviewers;

  •  * Matching of manual review decisions against CCH Reveal’s relevance scores enabling supervision of review quality and consistency.

It is well known that keyword searching which is the traditional method of document review prioritisation is flawed in that, on average, only 20% to 40% of the actual relevant documents are returned for review. Late last year, Equivio participated in a trial conducted by a  group of academics and legal practitioners (TREC) whereby Equivio>Relevance was run in parallel to traditional search techniques and methodologies and the results were compared.


The TREC results showed average yields for standard keyword search methodologies with 24% recall and 28% precision (on average). Equivio>Relevance outranked all comers achieving rates of 71% recall and 81.4% precision. The statistical model is the most impressive part of this technology. It provides you with reporting of your progress including a document by document rationale of why a document is deemed relevant. This model gives you confidence and visibility throughout the entire review process.


Is this potentially the watershed in ediscovery that we will look back on in time and say it revolutionised the way ediscovery is conducted? The evolution of ediscovery? For all those aspiring young lawyers and equity partners who are cringing right now, it does not replace the need for old fashioned legal document review, it just refines it. It makes document review fitter, smarter and stronger. It makes your offering fitter, smarter and stronger.


Oliver Wendell Holmes once said that “For the rational study of the law the black letter man may be the man of the present, but the man of the future is the man of statistics and the master of economics.”


Is this the answer to why “Darwinism has come to ediscovery”? If not then in my view it is definitely a step in the right direction.


Scott Gillard is National Electronic Services Manager for CCH Workflow Solutions Electronic Services Group