5 Ways To Tell AI based IDP Systems Apart from Traditional Ones

by Amit Jnagal, Infrrd

The Intelligent Document Processing (IDP)) market is going through an interesting phase - a lot of new players have entered the market and all of them make similar claims about their AI and use of Machine Learning Technologies. We have worked with customers who were led to believe that they are using an AI platform for IDP but found out very late in the process that the product used very minimal machine learning. Just enough to give them a check in the box that they are using ML but hardly enough to make a difference for the customers.

Our prospects often ask, how can they tell an AI/ML-first product from a traditional, regular expression-based, templatized solution. It is not that hard, we have put together a list of simple tests for you to find this out. Let’s dig in...

1. New Document Journey

Most IDP systems come with some ready structures for extracting data from a fixed set of documents - Invoices, Tax Forms, etc. Borrowing the ML nomenclature, everyone calls them ‘model’ irrespective of being backed by an ML model or a logic-based code. A true AI-based document understanding engine can train itself to understand any new document type. So if you try to configure a new document type that the system has not seen before you can get one of the two responses from the vendor - “give me your data and let me come back after a couple of weeks” or “here is our system, go ahead and train it yourself”. The former is usually an indication of a non-ML-based system and the latter - a true, AI-based engine. 

2. Accuracy Improvement Over Time 

The fundamental theory that all ML systems are based on is that their accuracy improves as more data is processed with them. While most traditional systems give you the following accuracy curve:

ChartDescription automatically generated

A machine learning-based system is supposed to yield the following accuracy curve:

Chart, histogramDescription automatically generated

Now, it is not practical for every customer to invest 3 months in figuring out whether a platform uses ML for data extraction or not. But you can run smaller experiments with a much more limited data set. Pick the information that is difficult to extract; something that you usually get 10%-20% accurately.

Do an incremental training running and observe this number move. It is much easier to make the accuracy needle move from 20% to 30% rather than moving it from 80% to 85%. But this will give you the validation of the IDP engine is being backed by Machine Learning or not.

3. ML is Data Hungry

Fundamentally, all Machine Learning algorithms need a large set of data to learn from before they can start making predictions. One reliable signal for detecting ML potency is the need for training data. Most AI-based training engines will require you to provide training data at the beginning of the IDP implementation. If you need to provide little or no data to start with, chances are that no learning models are used by the system.

4. Handling Variations

There is one thing that a logic-based IDP system cannot handle - complex variations. If you have a document that does not have a fixed format and comes in a lot of variations, then a heuristics-based system will not be able to do a good job. This is a good test to validate the machine learning foundation of the system. Take it for a spin with a document that has varying layouts and vocabulary then the difference in accuracy is definitely noticeable.

5. Employee Base

Finally, if you do not want to invest time in checking any of this, here is a quick check. Go to LinkedIn and search for Machine Learning. Filter it down to People and set the current companies to the name of the company that you are evaluating.

Graphical user interface, text, application, chat or text messageDescription automatically generated 

The number of Machine Learning people will give you a good idea of how much machine learning has gone into the platform that this company has built.

Machine learning-based IDP systems solve a lot of challenges that traditional solutions have not been able to solve. From handling variations, the complexity of tables, computer-vision-based pre-processing, and sorting of documents to automated ongoing improvements.

Your return on investment from an IDP system that is built from the ground up based on Machine Learning can be 10X more than traditional systems over 3 to 5 years. I hope this write-up gave you some pointers on how to choose carefully - all the best with your IDP platform selection.

Amit Jnagal is CEO of Infrrd, where he is responsible for overall business strategy, growth and culture. Prior to Infrrd, he was an Architect and Consultant for over a decade at IBM and Infosys.