What is IES?
- Workflow
IES (Information Extraction Service) is the OCR solution of the company OpenText, dedicated to working with the Vendor Invoice Management (VIM) system.
The only task of the IES is to extracting data from documents
and transferring them back to SAP, which in practice means reducing data entry work and automating the process.
Unlike previous OCR systems dedicated to VIM (we are talking about the ICC - Invoice Capture Center and BCC - Business Capture Center)., IES is a solution based primarily on a patented mechanism Machine Learning, self-improving during document processing. The mechanism learns during validation or after the SAP approval process is over.
Validation = This is an intermediate step between OCR and entering documents into VIM. At this stage, the user can manually correct the data recognized by OCR.
Vendor Invoice Management =. This is a dedicated SAP solution for the circulation of invoices, as well as other types of documents.
What are the differences between IES and previous OCR systems?
At this point, IES is available in two formats:
-> Intelligent Capture for SAP (hereinafter referred to as IC4S) and
-> Core Capture for SAP (CC4S)
The first of these, IC4S is an On-Premise solution, placed somewhere on the customer's own infrastructure, while CC4S is a cloud-based solution, available as part of a subscription purchased from OpenText or SAP. There are some differences between the two solutions, but technologically and configuration-wise they are very similar.
The situation is different compared to previous generations of OCR offered by OpenText, as can be seen in the table below.
Issue | ICC/BCC | IES | IES advantage |
Machine Learning | ART training - the ability for users to teach the system after selecting the appropriate option and configuring the solution for this. | Continuous, self-adaptive learning mechanism, requiring no additional action on the users' part beyond the typical use of a validation client (although this too is no longer required) | Automatic continuous learning process for all document types. |
Configuration | Customazing client available on the ICC/BCC server. Configuration both on the Customazing client side and inside SAP. Ability to extend functionality by writing scripts in C#. | All configuration moved to VIM. Ability to extend functionality by writing scripts in ABAP. | All configuration transferred to SAP. |
Data transfer | Regularly downloading data from SAP (supplier data, order numbers) and storing it in a database accessible under the solution. | IES has no need to download data, it uses information from SAP tables. | Less data needs to be synchronized and distributed. More secure architecture. |
SQL database | Base required | Database not required, IC4S uses SAP database, CC4S uses own database managed by OpenText | Lower service costs |
Transport of learning data | Impossible (theoretically possible within a single application and is related to exporting and overwriting the profile). | Transport between SAP/VIM and CC4S and IC4S is possible. It is also possible to download data from ICC/BCC. | A flexible tool for transporting learning data. |
Scenario for invoices | Preconfigured standard fields for 32 countries. Adding new fields possible, logic under new fields requires OCR engine configuration. | Preconfigured fields for invoices with built-in logic regarding processing for countries with Latin alphabet. Built-in knowledge base for many countries, giving a good recognition result from the beginning. Adding new fields possible, the logic for handling these fields happens on its own during the learning process. | A learning mechanism, requiring no additional action on the user's part. |
Supported languages (from the point of view of supported alphabets/character sets) | All of Western Europe and Central Europe, Scandinavia, Russia (including Cyrillic), Greece, Simplified Chinese and Mandarin Chinese, Korea, Thailand, Japan and Vietnam. | All languages used by ICC/BCC plus Hebrew. Extension to new countries available with future system updates. | |
Validation options | Windows-based validation client, Single Client Entry available in SAP GUI. | Same as in ICC/BCC plus the possibility of validation in FIori. | Support for Fiori. |
As can be seen, IES is a simpler solution, requiring less configuration and involving the customer less later in maintaining the service. In terms of efficiency, both solutions produce a similar result, and the final result of recognition in both cases depends on many variables.
How exactly does this solution work?
The IES solution is designed to learn continuously as it processes documents.
Component VIM Inbound archives i processes new documents, including sends them to OCR, where data extraction is performed. Then the decision engine in VIM checks the recognition result, if mandatory fields are not filled in or validation rules are not met, then the document goes to manual handling to correct errors.
Manual corrections are sent as a reply from VIM to IES, which learns at this point how this information should be obtained for a given document. The information gained during this entire process is then reused for the next similar case.
When a similar document, for example, from the same supplier comes into VIM, the appearance of the document will be recognized as something already known, existing in the knowledge base. In this case, the manual validation step can be skipped - the system itself will deal with the data that was not filled in the first time or was filled in incorrectly.
For most documents, learning is effective after performing up to 5 manual validations. However, in exceptional cases, for example, for documents with a complex table structure, up to 20 manual revisions may be required.
The knowledge base available in IES from the beginning of use should allow the recognition score for fields to be within 70-80%. After a period of time, when the learning mechanism works, the recognition score for fields should increase to more than 90%. After some time, new documents can be processed automatically without the need for manual validation, as IES has learned enough similar layouts.
At this point, IES is using several main concepts for its operations:
- Business Entity Determination (BED). - an algorithm that compares the data on the document with the data in SAP (exactly in SAP Master Data), while being susceptible to a learning mechanism. This mechanism is used to recognize supplier and recipient data.
- Single Click Entry (SCE) - An interface that allows the end user to capture information from documents using the mouse. The learning mechanism requires that the information be pointed this way (rather than, for example, typed manually from the keyboard). This is a convenient and fast method of validating documents.
- Table Auto Complete (TAC) - function that allows you to automatically fill in the table. It works in such a way that the user, using the mouse, first fills in the first row of the table, and then selects the appropriate option, after which the system fills in the rest of the rows.
- Recognition based on context (what type of document we recognize), the layout of the document, its structure, keywords, the relationship between text elements on the document.
- Voting mechanism based on the confidence index (a mechanism for selecting the best result from the entire list of alternatives sent by IES).
- Knowledge transfer from other suppliers, for example, the data used in recognizing the invoice date for supplier x can be helpful in determining the invoice date for the supplier.
- Delivered from the beginning knowledge base.
Summary
Information Extraction Service is the next-generation OCR solution available from OpenText. It is designed to recognize different types of documents, although the most popular application by far is the recognition of invoice data. At this point, it appears in two configurations with VIM - IC4S and CC4S, as well as in a cloud solution currently available under the name Core Capture (This is something different from CC4S). The solution is constantly being updated, with new features being added, such as new languages, so that the recognition result itself gets better and better - after all, that's what OCR is all about. It looks like the near future of OCR under SAP in OpenText will be related to IES.
We Manage the Digital Transformation of Your Business
Do you want to secure your business from cyberattacks? Or are you planning a digital transformation or looking for IT specialists for a project? We'd be happy to help. We are here for you. Let's talk about professional IT services for your business.
More from the category
- Workflow
Tomasz Tyrała
OpenText consultant at Lukardi S.A.