PDF Document Classifier - Usage Guide
PDF Document Classifier checks input PDF,JPG,PNG or TIFF using pdf/classifer
endpoint and return recognized document classes based on keyword based rules.
See also:
How to Create and Test classification rules
Classification rules are saved in CSV format with one line per class in the following format:
className, logic, keyword1, keyword2, keyword3 ..
where:
className
- a name of a class. It will be returned if rules from this class matched the documentlogic
- the logic to use for keywords. Can beOR
(default).OR
means that to identify the class it should match 1 or more keywords from the list).AND
means that all keywords should match to identify this class.keyword1
(alsokeyword2
,keyword3
etc) - is the keyword or a phrase to check. Plain text by default but also can contain regular expression, for example/\d+/i
Sample Rules:
Invoice,OR,Invoice Number,Invoice #,Invoice No,Tax Invoice,,
Purchase Order,OR,PO Number,Order Number,Order No,,,
Bill,OR,Bill Date,Billing Period,Bill Number,,,
Bank Statement,OR,/Account Statement/i,/Statement of Account/i,Business Checking,Accounts Payable,/Statement No/i,
Income Statement,OR,/Income Statement/i,,,,,
Has US Number,OR,"/\b-?(\d+,?)+(\.\d\d)\b/",,,,,
Classifier Testing Tool
Testing and composing rules manually may be tiresome. That is why we’ve created PDF Classifier Testing Tool
that you can use to compose, test and update rules. You can also generate ready-to-use JSON request that you can use with PDF.co or API Server API calls.
How to Use Classifier Testing Tool
in PDF Multitool
:
- Download ByteScout PDF Multitool app from this page (requires Windows 7 or higher)
- Install
ByteScout PDF Multitool
. - Run
PDF Multitool
. Then runClassifier Testing Tool
from the side panel at the left fromDocument Parser - Classifier Test Tool
. Classifier Test Tool
will run. Select PDF/jpg/png document or a folder with documents at the left panel with list of files.- Click on the
Test Rules
button to test rules. Review results and adjust results.
You can save rules as CSV. Or click on Copy JSON
to generate JSON request for use with PDF.co (cloud) or API Server (on-prem).
Copyright © 2016 - 2022 PDF.co