The Future of Information Retrieval

The Future of Information Retrieval

Month Date, 2006: Jason R. Baron, Director of Litigation at the National Archives and Records Administration in the United States has called for universal collaboration and an open minded approach to Information Retrieval (IR).

Speaking at the KM Australia conference in Sydney this week, Baron outlined the masses of information organisations are generating that could come out in a litigation process. “Literally everything is up for grabs in discovery now,” he said.

IR is a term that is somewhat bandied around nowadays, yet even though the term is approaching overuse, it covers such a large scope of terminology and quite often large fields of data that it may not be over used at all. Large scale IR is often associated with monstrous databases of countless millions of emails, the most famous being the 32 million emails that were handed over to the National Archives from the Clinton Administration.

However, smaller scale IR is often coming into play when you factor in how much information even a small business can generate. It is not uncommon for smaller businesses to receive in excess of three or four hundred emails per day, with larger businesses running into the thousands and even tens of thousands in a single day. When you now consider that each of those emails is accepted in the legal process of discovery, every business is generating massive amounts of information that could one day wind up in court.

Because of this it’s easy to see why IR methods are becoming more and more important. Normally when contemplating legal process and documents one thinks of a team of archivists riffling through papers. When you consider the 32 million emails of the Clinton administration, the 20 million emails from the tobacco giant Phillip Morris versus the USA case, or the anticipated 100 million or more emails from the current Bush administration, it is easy to see how accessing such databases is impossible for physical research.

This is nothing new however, with IR methodology being used each and every day by popular search engines Google and Yahoo. Applying IR methodology into litigation processes means being able to narrow down millions of emails and other document sources into a workable body of evidence.

Traditionally litigation has been a case of what the prosecution and defence can come up with separately. However Baron is reporting unprecedented levels of cooperation and collaboration between the two litigation sides in efforts to reduce the sheer scale of data into something useable. The most optimal result involves the use of Boolean and ranked methods of IR and as many different methods as resources permit.

This is due to the fact that with the sheer amount of information being accessed, human error and the ability for keyword searches to miss important documents is going to be high. With multiple search engines and methods there is a greatly reduced risk of this occurring. Baron has literally called for people involved in mass IR to “Buy all the search engines you can afford” as well as to have an “Open mind when it comes to new IR methods”. Indeed in a revealing chart display at his managing litigation risk talks, Baron demonstrated that nearly half the latest IR successes come from non-Boolean search models, a result that is radically different from modern trends and beliefs.

Whilst it is unlikely that your business or organisation will have to face the prospect of a 20 million email database IR such as Phillip Morris did. Considering the huge amount of emails and documents transmitted in every business electronically, the future of mass IR is set to escalate as even the smallest business produces huge document databases without realising it.

Comment on this story.