Prevent data leak mistakes with OCR

With almost half of data leaks now coming from accidental exposure by employees rather than outside hacks, ABBYY has moved to plug a gap in traditional Data Loss Prevention (DLP) with a newly released integration with Symantec’s DLP solution. Symantec DLP can now incorporate ABBYY’s Recognition Server OCR technology to recognise text in scanned images and screenshots

DLP has grown in prominence in recent years as organisations have attempted to monitor or block the transfer of sensitive or critical information outside the corporate network. The term is also used to describe software products that help a network administrator control what data end users can transfer.

The importance of this sector was demonstrated in Symantec’s recently reported financial results, where Data Loss Prevention (DLP) product revenues grew by 37% year-on-year.

The InfoWatch Analytical Center described 2014 as the "year of personal data megaleaks". InfoWatch is a leading DLP vendor that also produces the annual Global Data Leakage Report, which in 2014 registered 1395 cases worldwide, 22% more than in 2013. It found that banks - along with Internet services, retailers, and health care institutions - are the biggest sources of personal data leaks. The report’s authors found that almost half of all reported leaks in 2014 were accidental.

Many DLP vendors still struggle with preventing leaks via images, screengrabs and scanned documents. One of the reasons that few DLP vendors offer integrated OCR support is the potential impact on network performance of realtime scanning of every image contained in emails or as attachments.

The ABBYY solution offloads the processing to the company’s Recognition Server product. Presently this must be located on premise on single or multiple servers although ABBYY has announced plans to also integrate with its cloud offering at http://ocrsdk.com in the future.  More than 190 languages can be recognised, including CJK and Arabic.

A common application of DLP is to flag or block documents that contain certain keywords or standard formats such as credit card numbers, Medicare Numbers, etc. ABBYY Recognition Server 4 is able to extend this with its ability to recognise mathematical and chemical notation in technical drawings. One of the first international customers for the combined Symantec-ABBYY DLP solution is Russia’s HOST Group of Companies, which is using it to protect critical data at a nuclear power plant.

The company works with sensitive information in multiple formats, including technical drawings, specifications, reports, and industry regulations. Any attempt by users to upload such information, including document scans, to file storage, or remove it from the company’s premises must be tracked and dealt with. Fast optical recognition of documents reduces the risk of leakage.

The joint solution offered by HOST, Symantec, and ABBYY monitors information placed in the company’s file storage systems.

Documents are checked for certain key words and phrases (e.g. “confidential,” “restricted information”). Also documents containing sensitive information are removed from file storage and replaced with an administrator-defined text label.

Responsible personnel are notified whenever a restricted document is detected and access history by all users is automatically recorded.

A spokesperson, the Technical Director of the HOST Group of Companies, said the “solution doesn’t depend on the file storage server’s productivity as the resource-intensive OCR process take place on the ABBYY server. Within this architecture any amount of processing stations can be connected in order to establish a desired level of productivity.

“The OCR process will not generate a delay for Symantec DLP net blockage modules. There is no need for modifications in architecture, no impact on the detection servers’ productivity or need for creation of DLP special policies.”

ABBYY’s Server Manager can manage dozens of Processing Stations connected to it and effectively distribute the workload among them. The processing speed is dependent on the number of CPUs that are licensed, and is quoted at around 8 pages per minute per CPU by ABBYY.

Recognition Server also provides the flexibility to allocate single cores on specific CPUs of network computers for OCR processing.  This allows prioritization of OCR tasks, leaving DLP dedicated servers doing DLP analysis.

Henry Patishman, Sales Director for ABBYY Australia, said, “The addition of ABBYY OCR technology to the Symantec DLP solution opens up new possibilities in protecting the privacy of confidential information.

“Organisations will now be able to prevent the flow of graphical information from their networks without impacting users with annoying time lags.”

Contact ABBYY at sales@abbyy.com.au or on  02 9004 7401 for any further information.

An example of DLP with OCR blocking the transmission of a PDF containing credit card numbers