DocsCorp releases contentCrawler 2.1 | Information & Data Manager

DocsCorp has announced the release of contentCrawler 2.1 — the newest version of its integrated analysis, processing and reporting software ensures image-based content is 100% searchable.

contentCrawler’s automated end-to-end process intelligently examines image-based documents in a content repository and converts them to searchable PDFs, making them available to search technologies for indexing.

contentCrawler was developed to address the very real and serious issue of non-searchable content in enterprise content management systems. More than 20% of documents in a content repository are "invisible" to search technology.

These documents are often profiled as a result of ingestion of legacy or litigation documents, saving emails with attachments, mobile technology and employee workarounds that bypass the OCR'ing process. Failure to produce documents on demand impacts the bottom line, workplace efficiency, regulatory compliance, and productivity, and exposes an organisation to unnecessary risks.

The contentCrawler 2.1 release includes several usability and performance enhancements and improvements.

Multi-OCR processing - contentCrawler takes advantage of faster processing using multi-threading to optimise support for 4, 8, 16 and 32 CPU cores. For example, with 4-CPU core processing, contentCrawler will be able to OCR 1 page per second, or 85,000 pages per day. DoscCorp says this represents a significant improvement over other OCR solutions and remains unique in its ability to OCR documents already stored in a DMS. 16 CPU core;processing will be capable of OCR'ing 4 pages per second, or up to 350,000 pages per day.

File type filters - new file type search filters provide users with greater control over document types that can be processed. Users can exclude certain document types from the search to decrease processing time, including those saved as email message attachments.

Set up Service email notifications - Users can establish various email notifications to report on the progress of the crawl and request that the Service Statistics and Error reporting be emailed to them;

Monitor progress status - Users can instantly see the progress status of individual documents being processed at the OCR stage. This information is displayed to the user as a percentage.

Document information display - Provides document information such as total page number and size of documents being processed, including an overall total size of documents requiring OCR.

Configurable Multilingual OCR - Users can easily configure multilingual OCR’ing across all services. contentCrawler supports over 180 languages.

Export Report - Users can export processing reports as CSV files for analysis and review.

Configurable minimum disk space limit - Users can specify minimum free space threshold for document cache directory.

To obtain a contentCrawler 2.1 trial to see how much non-searchable content is in your content repositories. email info@docscorp.com.

contentCrawler integrates with HP Autonomy WorkSite, HP Records Manager (formerly HP TRIM), OpenText eDOCS DM, ProLaw, MS SharePoint as well as MS Windows file systems. Integration with OpenText Content Server and Worldox will be available soon.

Business Solution

Document & Records Management