Link Search Menu Expand Document

Document Classifier - Usage Guide

Document Classifier checks content of input PDF,JPG,PNG or TIFF using pdf/classifer endpoint. Then uses AI to automatically determine class of this document (for example, finance, invoice etc) and returns to user. Can also use custom defined rules for classification rules.

Use Document Classifier to quickly build workflow of sorting out input documents and pdf files.

See also:

How to use built-in AI rules

Just run pdf/classifier endpoint with your input document in url (see Upload Files if you need to upload local file first).

How to Create and test custom classification rules

Classification rules are storred in CSV format with one line per class in the following format:

className, logicType, keyword1, keyword2, keyword3 ..

where:

  • className - a name of a class. It will be returned if rules from this class matched the document
  • logicType - (optional) the logic to use for keywords. Can be OR (default). OR means that to identify the class it should match 1 or more keywords from the list). AND means that all keywords must be found. If logic column is not set then the app uses OR logic be default (i.e. one of keywords listed should be found to determine a class)
  • keyword1 (also keyword2, keyword3 etc) - is the keyword or a phrase to check. Can use regular expression, for example /\d+/ or /Medical Report|Med Report/i

Sample Rules:

Invoice,OR,Invoice Number,Invoice #,Invoice No,Tax Invoice,,
Purchase Order,OR,PO Number,Order Number,Order No,,,
Bill,OR,Bill Date,Billing Period,Bill Number,,,
Bank Statement,OR,/Account Statement/i,/Statement of Account/i,Business Checking,Accounts Payable,/Statement No/i,
Income Statement,OR,/Income Statement/i,,,,,
Has US Number,OR,"/\b-?(\d+,?)+(\.\d\d)\b/",,,,,
Medical Report,AND,/Medical Report|Med Report/i

Classifier Testing Tool

Testing and composing rules manually may be tiresome. That is why we’ve created PDF Classifier Testing Tool that you can use to compose, test and update rules. You can also generate ready-to-use JSON request that you can use with PDF.co or API Server API calls.

How to Use Classifier Testing Tool in PDF Multitool:

  • Download ByteScout PDF Multitool app from this page (requires Windows 7 or higher)
  • Install ByteScout PDF Multitool.
  • Run PDF Multitool. Then run Classifier Testing Tool from the side panel at the left from Document Parser - Classifier Test Tool.
  • Classifier Test Tool will run. Select PDF/jpg/png document or a folder with documents at the left panel with list of files.
  • Click on the Test Rules button to test rules. Review results and adjust results.

You can save rules as CSV. Or click on Copy JSON to generate JSON request for use with PDF.co (cloud) or API Server (on-prem).