How Law Firm used contentCrawler to index 28 million documents and emails

Stibbe is an internationally-orientated Benelux law firm with over 375 lawyers. From its main offices in Amsterdam, Brussels, and Luxembourg, together with its branch office in Dubai, London, and New York, Stibbe handles complex legal challenges for its clients both locally and cross-border.

The ability to search for and find 100% of documents is required to meet data return, erase, and portability requirements under the GDPR.

“We invested in an enterprise search engine to be future-proof before the GDPR came into force in May 2018,” explained Olivier Van Eesbeecq, Head of ICT & Facilities at Stibbe Belgium.

“Several products we were using – including our document management system – came with their own search engines, but we found them to be lacking. So, we decided to invest in enterprise search technology.”

For it to work effectively enterprise search relies on the existence of a text layer in every file in your system. But scanned files, TIFFs, JPEGs, and image-based PDFs (of which Stibbe Brussels had many) – don’t have that layer.

Full-text search in your documents is important because a) people don’t always remember the name of a file so it’s essential that on-page content can be searched, and b) under the GDPR, you need to be able to search for and find every document that contains a name, email address, bank account number, or other personal data.

To get the maximum benefit of its enterprise search investment, Stibbe needed a solution that could find non-searchable files that were not indexed for searching and could process them, so it had the necessary text layer to be indexed for searching.

Bulk conversion into searchable PDFs 

Search and assess technologies using OCR software can find non-searchable content and automatically convert them into text-searchable PDFs. Stibbe required a solution that could work “in the background,” so it wouldn’t impact staff workflows or processes.

“We were already using the DocsCorp desktop productivity solutions,” said Olivier, “so when we learned there was an automated OCR solution as well, choosing it was a no-brainer for us.”

contentCrawler is configured at Stibbe to be a set-and-forget solution. Staff continue to upload documents into the document management system, for example, without worrying about their need to be OCRed.

“If our lawyers photocopy or scan a file they simply add it to the document management system, and it’s automatically made searchable. That’s a big advantage,” Olivier commented.

“contentCrawler connected to all our document sources – like file servers, email servers, the document management system, SharePoint – and converted all the content into searchable PDFs,” Olivier continued. “Once contentCrawler processed the files the search engine picked it up and indexed it within minutes.”

“We now have more than 28 million documents and emails indexed by our enterprise search engine. All that content is now searchable thanks to contentCrawler.”
“Our staff have certainly noticed a difference since having contentCrawler,” said Olivier. “Although it’s a background process, they really see the value because they trust that their documents will be automatically indexed and made searchable. It also saves them time since they no longer need to use desktop scanners to manually OCR files.”

Stibbe used contentCrawler to unlock the benefits of its enterprise search engine since non-searchable documents were impacting its performance. Now, the firm has a solution that works silently behind the scenes, automatically catching every new document added to its file systems and adding a text layer when needed. Staff are able to search for and find content across 28 million documents and emails, and the firm can comply with GDPR requirements for data storage and handling.