Skilja unlocks the power of Intelligent Document Processing (IDP)

The year 2020 was a wakeup call for cloud adoption and digital transformation. Intelligent Document Processing, or IDP, is increasingly seen as a critical next step to increasing process automation and handling the huge amount of inflowing data. But what is IDP exactly? To learn more IDM spoke with Alexander Goerke, CEO and Founder of Skilja, a company developing essential technologies for understanding documents.  

IDM: What is IDP and how does it differ from data capture, classification and RPA?

AG: Intelligent Document Processing is an extension of classical data capture technology. It aims to take documents that were created for human readers and make them intelligible to machines.

Typically, data capture has been used in the past to automate predefined document driven business processes using a variety of different technologies including OCR, Forms Recognition or Anchor based detection of keys in a document.

Data capture is applied to fixed forms and semi-structured documents such as invoices and order confirmations to automate repetitive cognitive tasks. With the help of rules or machine learning, the patterns that allow humans to recognize entities in documents are identified so that the IDP software can repeat a task over and over again. We hesitate to call this artificial intelligence (AI) and prefer the term “cognitive technologies“ as the software actually tries to mimic human understanding and learn from human input.

IDP is the next generation of data capture and is typically applied to all kind of documents, even totally unstructured contracts and correspondence, learning from human input how to understand content. In contrast to enterprise data capture in the past, IDP typically does not require dedicated setup and instead “learns" from the “human in the loop“ and gets better over the time. The technologies that enable IDP include a variety of syntactic and semantic analysis steps combined with statistical evaluation and deep learning neural networks that are used in a complex background validation and learning service to continuously create and enhance the knowledge needed for document understanding.

Classification is of course a vital part of this.

RPA on the other hand is a consumer of the results. RPA is performing tasks that can be codified (scripted) - often interfacing between two systems - and therefore rules-based. It sends documents for interpretation to an IDP server and then can use the structured output of IDP to make automated rules-based decisions or simply transfer the data into another system. 

IDM: How does IDP accelerate value to processes?

AG: IDP is a vital part of scaling and speeding up processes that require human interaction to understand data in documents. Intelligent automation is the combination of IDP with RPA because it bridges the gap between unstructured data, human verification/validation activities and the structured data that is needed to automate processes. Not only are processes accelerated, they can also run round the clock and be scaled almost arbitrarily (in the cloud almost without additional cost). With IDP and RPA, even small tedious tasks on a desktop can now be automated. Even if only a few dozen surveys or time sheets need to be entered into a backend every day, that will still take up an hour each day of somebody’s time. Nowadays with IDP it is easy to set up a system that will take this task over from an office worker and allow them to focus on other tasks.

IDM: There are many IDP solutions in the market today promoting proprietary machine learning models.  How do you benchmark the performance of IDP engines?

AG: Humans are still the gold standard, so we want to mimic human understanding. IDP software aims to reproduce the data fields and indexes without having seen the documents before. The level of automatic recognition is described as “Recall" - the percentage of all fields that are automatically found and correctly recognized. The other side of the coin is the error rate, measured by “Precision", which indicates the accuracy of the system as a percentage of correctly recognized fields related to all recognized fields. Humans do make errors entering data, so in a project we adjust an IDP system to match the human error using recognition confidences. Typically, human error rates are about 1% so we allow that for the machine. This typically provides a “Recall" rate greater than 90, meaning 90% of all fields do not need to be read and touched by anybody. Modern IDP systems have built in benchmark tools that allow us to measure the quality of IDP against a ground truth sample.

IDM: Do you feel automated classification technology has now evolved to the level where regulated industry professionals can rely on it?

AG: Yes of course. Most of the projects we see are in the financial and insurance industry. All big public insurance companies in Germany use IDP to process their customer correspondence. Our largest customer is TK (Techniker Krankenkasse Public Insurance) in Hamburg that processes 600,000 documents every day with an automation rate of 87%. Think about the savings! Many banks are using it and we see a big demand in insurance claims management. Many insurers are now using IDP to provide no touch claims processing that uses automatic predictions based on the extracted data to say if a claim is valid or not.

IDM : What new capabilities do you expect the industry to deliver in the next 5 years?

AG: I expect that the adoption of IDP will grow vertically and horizontally. Vertically it will become a vital part of many business processes that are presently out of reach. The ongoing evolution of intelligent algorithms will also allow IDP to tackle complex problems with minimal setup that will go beyond the traditional high-volume process we see today. Online Machine Learning plays a vital role here as it allows you to start with basically no setup and let the system improve based on human feedback over time. Of course, a lot of precautions need to be taken so the system learns the correct things and does not deteriorate. There are many infamous examples of Bots that went south based on wrong training data. Machine learning systems require an elaborate infrastructure of checks and balances that silently work in the background to build incremental improvements. We have built this as a set of services that are used by a lot of customers with amazing success. 

Technologically we expect that online learning with an elaborate quality checking infrastructure leads to even more machine learning and automation of setup. It does not really matter if neural networks, statistical analysis or semantic methods are used. What is vital is that the machine can detect and validate the correct patterns and forget the wrong ones. So, interestingly enough “sagacious forgetting“ will be a major challenge to these semi-automatic bots in the future. 

Download Skilja Document Auto-Classification White Paper HERE

After a lengthy career in the global document and information capture industry, Alexander Goerke founded Skilja (Icelandic for “to understand”) in 2012. Skilja has built its own cloud-enabled document processing platform that incorporates unique machine learning and intelligent algorithms. Skilja also provides Data Classification and Extraction software that is utilised by other third party IDP solutions.