Turning plain data into knowledge: a 10 year case study

Turning plain data into knowledge: a 10 year case study

By Maggie Ball

January/February Edition, 2008: Mergers, acquisitions, remote offices, and hundreds of boxes of paper based documents from around the globe – Orica’s mission to turn available, but inaccessible data into workable knowledge was an epic journey.

Making History and filling space along the way

Orica has been an independent global Australian company for just over ten years, operating globally in over 50 countries with customers in over 100 countries. Ranked as one of the top 40 companies listed on the Australian Stock Exchange, the company has actually had a much longer history stretching back over 130 years.

Once a small company called Jones, Scott and Co, supplying explosives to the Victorian goldfields a buy-out from Nobel Explosives Company eventually merged with Brunner Mond and Co, United Alkali Company and British Dyestuffs Corporation to form Imperial Chemical Industries Plc (ICI Plc).

In July 1997, ICI Australia became an independent Australasian company after its parent company, ICI Plc, divested its 62.4 per cent shareholding in the company, changing its name to Orica. That's more than a century of expertise, market leadership, and innovative product development -- and an awful lot of data.

Data from an office in Ardeer

Just years before the divestment, an office located in Ardeer hosting a large proportion of information was closed – with a container full of paper shipped en masse to our Kurri Kurri Australia office. Not surprisingly, the office did not have the space to store it.

However knowing there was more than just data in the reports, we were well aware that with the right solution, we could make this disparate, mainly inaccessible information part of the key to our progressive blasting solutions.

Still, knowing the data is valuable does not fix the lack of space dilemma. There were simply was no available rooms for these reports to sit in hard copy. The existing library was needed for more office space. The archive space was inaccessible, and full. The only solution was to move this information off-site.

More boxes in the pile

The information from Ardeer was further complicated by the more than 200 boxes in our Denver, USA office, 150 boxes in McMasterville Canada, and numerous other file cabinets and archive boxes in need of attention.

With no information catalogued, the information was difficult to identify. Some of it was stapled or awkwardly bound, there were odd sizes and differing types of paper, and some of it was damaged and damp. Despite that, much of the information was business critical, containing important R&D information. It was more than just our corporate history, it was a corporate lifeline. The documents contained information on solutions to a problem in one country that could be used to solve similar problems in other countries. But in its paper state it was all but useless, inaccessible, and slowly decaying in less than ideal conditions.

Making the data available, but still secure

The first challenge was making the data available to our 14,000 or so employees (it was 9,000 10 years ago, but the number started growing quickly). Packed tightly, and poorly indexed, the information required both scanning and cataloguing. Well aware of the massive job ahead we set out to find the best scanning and cataloguing services for the documents in each country.

In the US, we found Mountain States Imaging (http://www.msimaging.com) and in Australia, Docuvan (http://www.docuvan.com.au/). Because we couldn't find a suitable scanning service in Canada, we determined it would be cheaper and easier to just ship the documents to the US for scanning together with our US information. The Ardeer information was already in Australia.

In both instance, pilot tests were conducted, with each provider given a local copy of a Lotus Notes library database built in-house. Lotus Notes is a good medium for finding and archiving existing, multi-formatted, and often unstructured information. It also had excellent in-built security, was used for our email system and therefore had the ability to be used for automated mail based filing. More importantly, it was already available for all our staff.

Notes also provided us with the capability to ensure the information was available via Internet Explored and through out Intranet. The library database consisted primarily of a series of forms and views with one form per document containing the title, the author, the date of document creation, a security category, and a place for a scanned in access. Once the pilot, which consisted of one full box, was completed successfully and checked, the full project work began. After scanning the boxes they were then archived offsite freeing up key office space. When the information was fully scanned and catalogued the local copies were uploaded by our IT staff onto our Notes servers.

Making the information searchable

One of the key objectives of this project was to ensure that the documents were available for full text searching; meaning information had to be scanned at 300dpi as a minimum. The database was already full text indexed, with attachments searchable. Most of the staff had basic training already but we developed further training in searching skills for the library.

Ongoing issues for 2008

There are some 400 boxes of paper based reports which we will need to scan from our Dyno Nobel acquisition in Europe - mainly from Sweden and Norway. Assessing the exact nature of this information, prioritising, obtaining a service provider and coordinating the local database for upload will be a challenge for 2008.

Incorporating translation capabilities to roll out the database to all staff is another challenge. With reports being produced in Spanish, Swedish, Norwegian, and German, linguistic issues are becoming more prevalent. Cross fertilisation across these areas is critical, but crossing the language barrier to achieve this kind of synergy is tricky. There are a number of commercially available products for translation and these are being assessed and tested for incorporation into the Notes/Intranet database.

Looking ahead: New opportunities

A key opportunity we are looking to explore is the use of "Knowbots" and other automated proactive searching tools. There is so much information on our systems now that it's almost overwhelming. Full text searching is terrific and it isn't hard to filter results, but it would also be useful to be able to do complex, regular and automated searches. This sounds easy, but the criteria for relevance isn't always obvious or explicit, and therefore the 'bots' would need to do an awful lot more than simply match a series of keywords. They would have to 'learn' about the nature of the work and then look for similar problems, analogies and subtle impacts that aren't all that tangible. In other words, we would want the system to behave like a cluey researcher. That's a big ask!

At the moment however, we're very happy with what we've been able to accomplish over the past ten years, particularly over the last 12 months. We've moved from huge office spaces with large staff numbers and full services to basically just a single spot in cyberspace. We've gone from having inaccessible bits of information in rapidly disintegrating files to having information that is an awful lot more than accessible. Our information is now a source of knowledge, synergies, and ultimately, one of key enablers for turning science into profitable customer solutions.

Comment on this story.