Much ado about metadata

Metadata is critical to many areas of information management. David McGrath, a Sydney-based solicitor with extensive industry experience, looks at a significent international judgement relating to metadata and ediscovery.

Without careful management of metadata, searching a library of documents becomes the equivalent of looking for the proverbial needle in a haystack.

Imagine what it would be like if you had to hunt through 3000 pages of documents and/or emails without any metadata to help you sort between them. This was the situation facing a plaintiff in a recent US Court case, after the defendant’s legal team ignored an FOI production protocol request.

Their actions earning a stinging rebuke from Federal Court Judge Shira Scheindlin, who then imposed her own ediscovery-style production protocol.

The importance of the “National Day Laborer decision”, as it is now known, extends beyond the borders of the US and has many implications for the form in which ediscovery is given as well as “discoverability” of metadata.

The case involved eight “day labourers” (casual workers) who alleged they were the victims of racial profiling and anti-immigrant sentiment after being picked up by a police officer posing as a contractor offering work in 2006. After the men entered the officer's van, they were driven a few blocks and turned over to US Immigration and Customs Enforcement (ICE) agents.

So what was all the fuss about metadata? Following an FOI request to the US Immigration and Customs Enforcement Agency, 3000 pages had been handed over with countless more to follow.
The scanned documents were provided as five PDF files, but the problem was the production protocol requested by the plaintiff’s lawyers was completely ignored.. The result was that the judge found that the format in which the documents were produced was both ‘unusable’ and ‘significantly degraded’ in terms of efficient review by the plaintiffs. For a hard copy comparison, imagine dealing with a large bundle, of say 10 boxes of documents. These have been collated in a series of folders with an index complete with volume, document and even page numbering.

However when they are handed over to you, the index is withheld and the documents are indiscriminately merged together so that you can’t tell where one ends and the next one starts.
In short, you have a mess. Just like any other mess, it will take both time and money in order to clean it up before you actually start reviewing.

Given that the documents were previously organised very well, you would probably expect that the documents should have been handed over to you in a way which was “reasonably usable”.

Load Files

By producing static images indiscriminately merged together without a “load file” the “Day Laborer” defendants had effectively “dumped” the documents on the plaintiff. Justice Schira Scheindlin, who ruled “it is by now well accepted that when a collection of static images are produced, load files must also be produced in order to make the production searchable and therefore reasonably usable.”

The load file, as its name suggests, contains the data which enables the documents to be uploaded into electronic document review software (e.g. Ringtail, Relativity etc). These technologies utilise a database (and text search indexes) to reference their electronic documents. The load file is just providing the database with the information it needs to reference the documents.

This requirement is specifically addressed by the Advanced Document Management Protocol (ADMP) in Australia’s Federal Court Practice Note CM6. This states the load file information should be provided in a “prearranged or standardised format.”

The load file or its equivalent is not only crucial to making proper ediscovery, it is also critical to ensuring that technology delivers efficient document management throughout litigation. The initial processing of documents into a document review system comes at a significant cost. One of the benefits of electronic exchange is that processed documents are transferred to other parties thereby saving additional processing costs.

In the United States, in the absence of a specified production format, the responding party may choose whether to produce documents in either the form in which it is “ordinarily maintained” or in a “reasonably usable form”.

Producing ESI content in a “reasonably usable form” does not mean that it can be converted from the form in which it is ordinarily maintained to a different form that “makes it more difficult or burdensome for the requesting party to use the information efficiently”. The biggest error that is still being made by lawyers working in Australian ediscovery today is failing to demand the equivalent of a load file defined under a proper electronic exchange protocol.


An important aspect of the “Day Laborer” case concerned the extent to which metadata should have been provided in that load file. Metadata is either native, for documents that begin their life digitally in a computer program, or created when a physical document is scanned. An email is a good example of useful metadata in a native format document. Its metadata includes from, to, cc, subject, date sent and attachments. A loose file, say a word document, would have metadata including filename, path, file type, file created date and last modified date.

This metadata presents real benefits both in terms of searching for documents (all email between person x and y during November, 2007) as well as for forensic evidentiary purposes (the file created and file last modified dates occur months after the date on the letter suggesting that it was fabricated post event).

A native electronic document converted to an electronic image does not retain its metadata. Accordingly, it is extracted and retained in a separate file (just like a load file). The original native file is also retained.

Justice Scheindlin set out a list of metadata fields which she believed to be the minimum for any production of a significant collection of ESI. The specific categories that were required to be contained in load files to accompany any future production of ESI are as follows:
1. Identifier (a unique production identifier for the document);
2. File Name;
3. Custodian;
4. Source Device;
5. Source Path;
6. Production Path;
7. Modified Date;
8. Modified Time; and
9. Time Offset Value.
In addition, Judge Scheindlin indicated that the following additional metadata fields should be produced with the production of any emails:
1. To;
2. From;
3. Cc;
4. Bcc;
5. Date Sent;
6. Time Sent;
7. Subject;
8. Date Received;
9. Time Received; and
10. Attachments.

Finally, Judge Scheindlin indicated that productions of paper records also should include the following data:
1. Bates_begin (Page_start);
2. Bates_end (Page_end);
3. Attach_begin; and
4. Attach_end.

Although these fields reflect US ediscovery standards, the list still sets a useful precedent for other courts and legal teams around the world. Some believe it will become the starting point for all conversations about the fields that must be included in future US cases.

Fortunately, in Australia, we already have a starting point for these discussions. This is the Federal Court Practice Note CM6. Its twin document management protocols include directions and suggestions for everything from the document format (searchable image files) to where page number labels will be placed on documents. There are at least 17 fields for core information with a further 11 suggestions for extras. This makes is easier for parties to arrive at an exchange protocol suitable to their requirements.

Those who want more metadata will have to justify their request however. In a 2006 Federal Court case (Jarra Creek Central Packing Shed Pty Ltd v Amcor Limited) Justice Brian Tamberlin held that embedded electronic information in relevant documents was discoverable, he also said that the normal discovery process had to be observed. He ruled that discovery should only be ordered where it is necessary. In that case, the application for discovery of an additional nine fields of meta data (the equivio or de-duplication fields) in addition to the 14 already agreed in the protocol was found not to be necessary (at least not at that point in time) and was denied.
This is not to say that it can’t be done. Each case needs to be considered on its merits and there is a growing list of cases in Australia now where parties have been obliged to go outside the usual discovery requirements to ensure that the other party receives the information in a form it can use.

There would be few judges capable of wading so deeply into the area of digital metadata as US Federal District Court Judge Shira Scheindlin. Her actions in calling the defendant to account over a wholly inadequate electronic discovery effort will prove to be an important step in the development of more acceptable practices in the electronic exchange area.

It gives lawyers a clear precedent and guide as to what constitutes acceptable and unacceptable electronic production. The more that lawyers understand what is required, the easier it will be for everybody.