It’s all about the data

By Angela Bunting

Although overused and over-hyped, the phrase ‘big data’ has at its core a promise: with enough data, processing power and intelligence, organisations can yield insight, predict the future, make better decisions and gain a competitive edge.
However, when it comes to unstructured data – human generated information found in emails, documents, photos and other formats – it’s typically not about predicting the future, but reacting to a complex event such as litigation, regulatory compliance or a cybersecurity breach. Often, the biggest struggle is not getting enough data but having too much.

Business problems including litigation, investigation, records management, privacy, cybersecurity and storage optimisation all require organisations to ask difficult questions of their unstructured data and receive comprehensive and timely answers. These questions include:
• What is our risk exposure?
• Who did what, when, and can we prove it?
• How did the bad guys get in and what did they steal?
• Are these documents or emails company records?
• Where is our intellectual property?
• Does this data we’re storing have business value or does it pose a risk?
• Where do we keep high-risk or high-value data and how can we find out if it escapes?

Providing the answers to these questions, quickly enough to be valuable to the business, is hard work. Unstructured data formats are much harder to search and analyse than databases or simple text. In this context, “big data” is a challenge, not an advantage. This is due to the massive volumes of unstructured data organisations create and store, and the large proportion of this information that is not relevant to the task. To solve these disparate problems, most organisations invest in point solutions such as eDiscovery, forensic investigation, records management and information security applications. IT departments must provide technical and logistical support for many of these tools at considerable cost.

Even though the questions being asked of the data are quite similar, the answers are required by different parts of the business. Under these circumstances, it’s difficult to expect a coordinated response. However, the end result is organisations using several different and very expensive hammers to drive the same nail. Although the people in different parts of the business may not realise it, they all face common challenges. These include hard-to-understand data formats, too much data, important data stored inappropriately and multiple tools and point solutions.

A data-centric approach
A common thread connects litigation, investigation, records management, privacy, cybersecurity and storage optimisation: all require organisations to ask difficult questions of their unstructured data and receive comprehensive and timely answers. By taking a data-centric approach, and using the right technology, organisations can crack open the content of their unstructured data and develop processes and competencies that will reduce costs, improve efficiency and deliver new sources of business value.

Workflow automation - eDiscovery and investigation tools are highly complex with large numbers of options for processing data in different ways. Inconsistent handling of evidence sources is risky, especially for matters that could end up in court. Using advanced technology, organisations can formalise processing workflows and settings into a template or series of templates. This means staff members with limited expertise can process data consistently and defensibly.

Collaboration - Certain technologies make it possible to provide compartmentalised and secure access to case data for external parties such as lawyers and subject matter experts. It’s easy to divide up tasks along whatever lines make sense, including date ranges, custodians, locations, languages or content types. For example, you could pass financial records to a forensic accountant or internet activity records to a technical specialist.

Sharing the workload - Choose software that gives you flexibility to share the same case file format across desktop, server and web-based applications. This means that sharing work internally or with external providers at any stage of the discovery or investigation process is as simple as transferring a case file.

Text and visual analytics - Use tools with built-in text analytics such as auto-classification, clustering, topic modelling, text summarisation, deduplication and near-duplicate management to search, understand, classify and minimise data sets. Interactive graphical tools including timelines, communication network diagrams, commonality network diagrams and trend, pivot and intersection charts make it easy to slice, dice and visualise data so you can quickly identify trends, locate information of interest and drill down to specifics.

Living index - Frequently litigated or investigated organisations use powerful collection and discovery technologies to maintain a regularly updated index of all files and emails. The automated collection technologies conduct scheduled updates, adding only the most recent data to the index. This index is instantly searchable, eliminating the lag between when someone asks a question and when the organisation can start finding the answers.

Information governance - The proliferation of data is a major driver of costs in litigation, investigation and many other information-gathering activities. Some organisations are seeking to minimise their storage volumes by eliminating data that is duplicated, trivial, obsolete, past its retention period or even potentially harmful.

Rather than waiting for a trigger event such as a lawsuit or an email migration, some organisations are initiating information governance processes to do this as a matter of course. Such information governance projects very quickly become self-funding through smaller litigation budgets, reduced storage spending and improved risk management. They can also become a source of business value as employees become more effective and organisations leverage the knowledge they have gained from understanding their own data.

Angela Bunting is Director of User Experience at Nuix.