contentCrawler helps exploration giant find critical documents and drawings

Oil and gas exploration firm PTTEP AA has deployed DocsCorp contentCrawler to help users deal with a document management system holding more than 500,000 documents

PTTEP AA is a wholly owned subsidiary of PTT Exploration and Production (PTTEP), the Thai national petroleum exploration and production company. In Australia, PTTEP AA is the operator of the producing Montara oil field and the Cash Maple gas field in the Timor Sea. PTTEP AA employs more than 300 people based in Perth, Darwin and the Timor Sea.

The Montara oil field is located in the Timor Sea 180km from the north Kimberley coast off Western Australia. The development includes an unmanned four-legged well-head platform and the Montara Venture (pictured above), a Floating Production Storage and Offloading (FPSO) vessel with up to 850,000 barrels of storage.

PTTEP AA had commissioned the construction of the Montara Venture FPSO vessel, awarding contracts to various suppliers. Documents from contractors and suppliers were stored in a leading document management system. All in all, there were approximately 500,000 documents relating to the project.

To access documents, drawings or information relating to the project, engineers would enter a part or tag number into a search field in the document management system and be presented with all the relevant documentation. 

Image roadblock

“At least that was how it was supposed to work,” recalled Trina Ireland, PTTEP AA Information Management Team Leader. 

“We very quickly discovered that many of the documents supplied by vendors, contractors and subcontractors were in fact image-based documents – JPEG, TIFF, PNG and image-based PDFs.” 

These types of files cannot be indexed since there is no text, they are essentially like pictures and were in effect “invisible” to the document management system index engine.

The engineers were getting more and more frustrated with the system to the point they were starting to lose confidence in it. They turned to PTTEP AA’s Document Controllers to find the documents they needed, which led to inevitable delays. 

“We looked to our document management system consultants for a solution. They recommended contentCrawler from DocsCorp,” explained Trina.

contentCrawler is an integrated analysis, processing and reporting framework. It intelligently assesses image-based documents in a content repository for batch conversion to text-searchable PDF documents, which can be saved back into the content repository as a new version or as a replacement for the original.

Converting image-based documents to text-searchable PDFs can be an automated end-to-end process or a manual one with built-in “Hold for Review” stages. Equally, processing can run in one of two (or both) modes: Convert Backlog (legacy documents) or Active Monitoring (just profiled).

PTTEP AA set up a development lab to test contentCrawler. Non-searchable documents were added to the library as control documents. Testing was conducted over a period of 1 month, at the end of which PTTEP AA decided to deploy contentCrawler on the production environment.

The document management system environment consisted of three libraries. It was decided to run contentCrawler on each of them in turn to address the backlog issue. Once the backlog was complete, they switched to Active Monitoring mode for newly-profiled documents. contentCrawler can run in both modes simultaneously.

Automated solution

PTTEP AA ran contentCrawler as an automated process, replacing the original with a text-searchable PDF. Trina recalls “it really was a set and forget operation – it just worked in the background with little or no intervention from the team.”

In addition, IT Administrators can install and configure contentCrawler from the centralized monitoring and reporting dashboard. Administrators can also set up various email notifications to report on the progress of the crawl, requesting the Service Statistics and Error reporting be emailed to them. They can export processing reports as CSV files for analysis and review.

While solving the problem with contentCrawler proved to be a fairly straightforward process, PTTEP AA had a much bigger challenge ahead. Many of the engineers had given up on the document management system as they couldn’t find what they were looking for. The company had to “re-educate” and reassure engineers that the issue had been resolved and that they would be able to find everything they were looking for.

 “contentCrawler really complimented the document management system product, whose reputation had taken a beating. contentCrawler went a long way to restoring everyone’s faith in the product,” concluded Trina.