Skilja LLM improves Document Understanding

Business process automation developer Skilja has announced the integration LLM based extraction into its new software releases, LAERA 3.0 & Tegra 3.0 with Classifier 6.0 and Information Extraction 5.0.

CEO and Founder Alexander Goerke said, “This is not a simple API call that we use, but we have trained our own (smaller) LLM based on numerous examples from the industry, which is used as a base model for token generation.”

The Skilja LLM, called LaBERTa, accurately represents the language used in business because as it was trained on millions of snippets from real business documents.

 “With the new LLM feature significantly fewer examples are now needed, and recognition rates in the tests up to now are greatly improved,” said Goerke.

Training of the model is undertaken by simple labelling without any specific configuration except for field names and types thus reducing the overall complexity of setting up an extraction project.

LAERA LLM can be run on the GPU for optimal speed and performance, or in CPU mode.
The company says that when combined with the new LESA OCR model 2024-08, trained with additional examples, unprecedented automation rates can be achieved.

Classification and extraction are fully configurable through web interfaces, and can be installed on-premises or in the cloud, and are fully multi-tenant with API key protection for each tenant.
The LaBERTa – layout aware LLM is optimized on German and English documents and provides different sizes for multi-language support (104 languages).

It supports classification and extraction and the base model can be extended by adding training examples for fine tuning.

Multi-tenant and permissions features include:

  • Classification and extraction projects can now be hosted by the Designer Service (back-end service of the web designer)
  • Can also be accessed from any Windows Designer or SDK application via https and username/password or API key
  • Enforced access permission, like read/write permission per project and project visibility per user
  • Basic support for multi-tenant hosting of classification and extraction solutions
    Multi-tenant access supported for projects and online learning service (classification and extraction)
     

LLM based extraction now possible in Paragraph Locator, Trainable Field Locator and Role Schema. Role Schema now allows for a simpler extraction definition(prompt extraction) without any locator but using the LaBERTa LLM instead.

https://skilja.com/