Tuning Abbyy FlexiCapture Layouts and Document Definitions

So you have spent many hours analyzing and creating the layouts and definitions for the documents you need to be processed through Abbyy.  Now you should be almost ready for production, except you need to tune.  Many samples of the documents in question need to be run through and the results checked over very carefully to find and fix all the little issues that will be present.

Tuning involves not finding the bugs in your definitions but finding the little differences in the printed documents that are processed.  These differences may be due to printing offsets on the printed form that is then run through the printer where the actual data to extract is found.  In addition, there can be other cases where the Header or Footer elements are not extracted correctly.  All these differences can add up to Abbyy not detecting the correct document definition to apply to the scanned images.

In order to correct these issues a very careful analysis of results need to be viewed through the Design Studio.  Import the document in question into the Studio and then process it.  Look carefully at what was missed.  Many times it is due to the Search Area not being large enough to cover all the letters/numbers to be extracted.  Also, within a group the required and option flags have a lot to do with if the group is found or not.  All it takes is one search element within the group that is not found and the entire group may be marked as not found, so be sure to check them over the flags carefully.

There are going to be times with multiple Document Definitions that a specific document does not match the definition it should have, but some other definition.  This can be caused by the error percentage on the wrong document definition to be set too high a value when both document definitions share a similar field to extract.  To fix this just take the error percentage down a few points and try the recognition again.

It takes a lot more effort to tune a document definition especially when dealing with multiple document definitions and paper documents that are difficult to scan in with enough clarity for the OCR engine to work properly.  This is very true for Transcript type documents where each transcript has its own copy protection mechanism that the scan software must try and compensate.  However it works out, so be prepared to spend the time and effort to get the document definitions to the point where they work most of the time.

Christopher J. Hillenburg
Senior System Engineer
ImageSource, Inc.

Kevin Neal - September 26, 2011

Chris,

Excellent article. The importance of fine-tuning can not be overstated. It’s best to test and re-test, then adjust for optimal results.

Comments are closed