Beyond Big Data with Recommind CEO Bob Tennant
On a visit to Australia earlier this year, Recommind CEO Bob Tennant sat down with Image & Data Manager to discuss the evoluton of the company's CORE platform beyond the ediscovery market.
IDM: Many regard Recommind as an ediscovery company. Is that the extent of what you do?
BT: ediscovery is just one of the things that we do. It happens to be an area that’s getting a lot of attention right now, so it’s an area that we have put a lot of effort behind. The best way to think of Recommind is a company that provides software that helps you manage and analyse and structure information. One of the things that has a lot of unstructured information is related to lawsuits and so ediscovery is an important.
IDM The term “Big Data” is dominating a lot of discussion of information management and ediscovery these days. What do you understand this to mean?
BT: The first that I’ll say about big data is I don’t like the term and the reason is that there’s no technology that you’d say this is a big data product or it’s a big data application. What big data gets at is that fundamentally there is a problem and the problem is that the volume of data has gone crazy. So, there is a real trend that underlies all of this stuff and so when you think about products or software to help you manage it, you need to think about things that help you with large volumes of data. And so, when I think big data, I think first off unstructured information because it’s the unstructured stuff, the text that’s growing much more quickly than anything else. And the second thing is in order to deal with that the amount of data accumulated requires you manage it, so it creates a problem that is how do I manage this stuff? It creates an opportunity and the opportunity is now that you’ve got all this data you can analyse it and get insight that you wouldn’t otherwise have gotten out of it, so, unstructured information, management of large volumes of data and getting insight from it. That’s really what big data is about.
IDM: So is the answer to be reactive or proactive? Or is it both?
BT: I think it’s got to be both. If you want to get insight from the data you have to have it somewhere to start with and if you got it somewhere then you need to, a bunch of reasons why you need to manage it; in order to get insight you need to manage it, but you also need to manage it for compliance reasons, for business reference purposes, for discovery purposes and again, that the analysis comes after you’ve got the management under control.
IDM: And, where does it differ from traditional business intelligence (BI)?
BT: Again it goes back to the unstructured information part. So, traditional business intelligence helps you to visualise and to understand better relationships and information that’s contained within your structure data set. With unstructured information it’s not like being in a structured schema. There’s no map in the form of a database schema that says look here when you want this piece of information so the first thing you need to do is understand what the context or what the content of the text is actually saying, so being able to take unstructured information and to structure it is something that’s not something that existed with traditional business intelligence. But it’s really important part of being able to get good insight out of big data.
IDM: Some are suggesting that new platforms such as Hadoop provide a way to store unstructured data that provides you a way to analyse it without structuring it. Is there a magic wand that can give structure to unstructured data and allow analysis, or is it a matter of giving every bit of unstructured data a tag so you can give it some structure or identifying it and applying meta data?
BT: There’s a lot of hullaballoo around Hadoop, which is basically just a distribution strategy. It provides an infrastructure that allows you to take a computing problem and to break it up over many different computers. That’s effectually all it does. So, that’s a useful tool when you’re trying to deal with very high volumes of data, but it doesn’t actually give you anything in and of itself. So, if you want to manage data or you want to get insight from the data you need a set of tools that allow you to do that and so when you’re dealing with unstructured information again, and that’s where most of the growth is and where a lot of the opportunity lies. The first thing you need to do is to understand what’s within the text. So, processing data is not enough. Just loading it into a system is not helpful. So the way the typical keyword search engine works for example is it helps you count things. You can count the number of words, number of pages, count the number of appearances of a term in a document; none of that is useful. What you really want to understand when you load the information in, is what does this mean? And so, you need to be able to marry up the understanding of context or the predictive element of it with the ability to handle large volumes of data which is where something like Hadoop comes in. So, the marriage of those kinds of technologies gives you a pretty powerful tool set. ?
IDM: It has been noted that dealing with a problem of big data requires a change in mindset from trying to manage storage systems to trying to manage information. Where do you see Recommind helping companies that deal with the explosion of different data types?
BT: That’s where our platform comes in. So, what’s unique about Recommind is our ability to combine the understanding of the content and the context with the ability of managing data at very large scale. So, what our platform which we call “Core” consists of is effectively it’s a form of database effectively that allows you to manage unstructured information along with structured information at great scale. But, it combines it at a very deep level with machine learning which provides the understanding of the context. And so, if you make it easy to access that understanding then it’s very easy to build applications on top of it, and so we have a set of applications that help people and productivity. It’s not just about storing or even processing data. It’s not about adjusting it. It’s about understanding it and that’s where the information part comes from.
What we’re seeing in our client base is there are a lot of organisations that are looking to try and get their arms around data governance generally. We really have noticed this over the course of our history. For a long time we had a vision that essentially can boil down to indexing, so get the data in, understand what it’s all about and then build many different applications off of that one index of data. And, seven or eight years ago we’d describe that to people and their eyes would glaze over and they’d say, “That’s great you can do this wonderful stuff, but we’re just busy trying to make sure that our servers don’t fall down.” And that’s one of the things that I think is actually different now and why people are talking about big data is they have all the baseline infrastructure in place, they’ve got their servers, they’ve got their routers, they have networks that work and now they’re thinking about, okay that stuff is done, how do we actually take advantage of this data that we’ve got, and, that’s new.
IDM: Is Enterprise Search still an important plank of the future for Recommind?
BT: The reason that I'm really excited about where we’re at as an industry and as a company is that the technology is now at a point where we can go beyond just what search provides. So, if you think about any time you’re searching for something, what you’re really trying to do is get an answer to a question, you want to understand something that you can get an answer for and perhaps take some action on. And what Enterprise Search or search generally does is it gets you only partway there. So, you type in your query, you’re looking for an answer to a question, what you get back is a long list of documents, and then from that list of documents you have to go do your own research to get the answer to the question. Where we’re at now, and what Core is providing is the ability to take you a lot further along that path. So, you can still do Enterprise search, you can still get back the list of documents but it can also get you to the point where you’re actually getting an answer to your question which personally is something that I would prefer.
IDM: E-discovery in 2012 was all about the rise of predictive coding, providing a way to help legal teams navigate massive numbers of documents. Do you see the technology reaching out to other areas in 2013?
BT: There’s nine billion connected internet devices out there right now. By the end of this decade there’ll be fifty billion and each of these is creating some kind of content. We speak to companies that have 4.5 billion email records in their systems. It’s a giant haystack. What our software does, is gives you the way to find through probabilistic and semantic and contextual analysis (the needles) and get rid of the haystack. What we’re seeing with ediscovery is that predictive coding has become mission critical for a law department and next it will become mission critical for enterprises. Once we have our arms around this large amount of data, there are many business problems that surround it – one of them ediscovery - but there’s several adjacent apps that we’re looking at and we’ll start to announce those in 2013, to provide a framework that allows you to build and solve these business problems very easily.
We always look for the killer app. I don’t believe it exists in unstructured data and so we’re building a framework that lets us carry into the multitudes of business problems that we can solve, whether it’s risk assessment or if it is governing what I have and how do I remove the unstructured data that I no longer need. How do I even know what I can take off-line? And those are all very good problems within the corporate as well as the legal area.