Link Search Menu Expand Document

Document Classifer

Auto classification Of Incoming Documents

Use /pdf/classifer endpoint (see below) to automatically sort / detect the class of the document based on keywords-based rules. For example, you can define rules to find which vendor provided the document to find which template to apply accordingly.

Available methods

[POST] /pdf/classifier

Description: PDF classifier to sort PDF, JPG, PNG files by their content using the set of rules with keywords.

Parameters

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching.
  • rulescsv. required. Sets classification rules in CSV format where each row contains class name and keywords separated by |. Each row is separated by \n symbol. Example: Amazon AWS,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nACME,ACME Inc.|1540 Long Street, Jacksonville, 32099
  • rulescsvurl. optional. Alternatively you can provide the link a file with CSV containing classification rules.
  • caseSensitive. optional (default to true). Defines if keywords in rules are case sensitive or not.
  • inline. optional. Set to true to return results inside the response. Otherwise endpoint will return a link to output file generated.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns Use JobId that you may use with /job/check to check state of the processing (possible states: working, failed, aborted and success). Must be one of: true, false.
  • encrypt optional. Enable encryption for output file. Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • profiles optional. Must be a String. You can set additional and extra options using this parameter that allows you to set custom configuration. See profiles samples for examples.

Description

  • Method: POST
  • URL: /v1/pdf/classifier

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
    "caseSensitive": "true",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
} 

Example responses

/pdf/classifier
{
    "body": {
        "classes": [
            {
                "class": "ACME"
            }
        ]
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "remainingCredits": 99972360,
    "credits": 42
}

Code Snippets

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/classifier' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
    "caseSensitive": "true",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
} '
JavaScript
var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");
myHeaders.append("x-api-key", "");

var raw = JSON.stringify({
 "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
 "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
 "caseSensitive": "true",
 "async": false,
 "encrypt": "false",
 "inline": "true",
 "password": "",
 "profiles": ""
});

var requestOptions = {
	method: 'POST',
	headers: myHeaders,
	body: raw,
	redirect: 'follow'
};

fetch("https://api.pdf.co/v1/pdf/classifier", requestOptions)
	.then(response => response.text())
	.then(result => console.log(result))
	.catch(error => console.log('error', error));
NodeJs
var request = require('request');
var options = {
	'method': 'POST',
	'url': 'https://api.pdf.co/v1/pdf/classifier',
	'headers': {
		'Content-Type': 'application/json',
		'x-api-key': ''
	},
	body: JSON.stringify({
	 "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
	 "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
	 "caseSensitive": "true",
	 "async": false,
	 "encrypt": "false",
	 "inline": "true",
	 "password": "",
	 "profiles": ""
	})

};
request(options, function (error, response) {
	if (error) throw new Error(error);
	console.log(response.body);
});

PHP
<?php

$curl = curl_init();

curl_setopt_array($curl, array(
	CURLOPT_URL => 'https://api.pdf.co/v1/pdf/classifier',
	CURLOPT_RETURNTRANSFER => true,
	CURLOPT_ENCODING => '',
	CURLOPT_MAXREDIRS => 10,
	CURLOPT_TIMEOUT => 0,
	CURLOPT_FOLLOWLOCATION => true,
	CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
	CURLOPT_CUSTOMREQUEST => 'POST',
	CURLOPT_POSTFIELDS =>'{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\\nDigital Ocean,DigitalOcean|DOInvoice\\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
    "caseSensitive": "true",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
} ',
	CURLOPT_HTTPHEADER => array(
		'Content-Type: application/json',
		'x-api-key: '
	),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

Java
import java.io.*;
import okhttp3.*;
public class main {
	public static void main(String []args) throws IOException{
		OkHttpClient client = new OkHttpClient().newBuilder()
			.build();
		MediaType mediaType = MediaType.parse("application/json");
		RequestBody body = RequestBody.create(mediaType, "{\n    \"url\": \"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf\",\n    \"rulescsv\": \"Amazon,Amazon Web Services Invoice|Amazon CloudFront\\nDigital Ocean,DigitalOcean|DOInvoice\\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099\",\n    \"caseSensitive\": \"true\",\n    \"async\": false,\n    \"encrypt\": \"false\",\n    \"inline\": \"true\",\n    \"password\": \"\",\n    \"profiles\": \"\"\n} ");
		Request request = new Request.Builder()
			.url("https://api.pdf.co/v1/pdf/classifier")
			.method("POST", body)
			.addHeader("Content-Type", "application/json")
			.addHeader("x-api-key", "")
			.build();
		Response response = client.newCall(request).execute();
		System.out.println(response.body().string());
	}
}

C#
using System;
using RestSharp;
namespace HelloWorldApplication {
	class HelloWorld {
		static void Main(string[] args) {
			var client = new RestClient("https://api.pdf.co/v1/pdf/classifier");
			client.Timeout = -1;
			var request = new RestRequest(Method.POST);
			request.AddHeader("Content-Type", "application/json");
			request.AddHeader("x-api-key", "");
			var body = @"{" + "\n" +
			@"    ""url"": ""https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf""," + "\n" +
			@"    ""rulescsv"": ""Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099""," + "\n" +
			@"    ""caseSensitive"": ""true""," + "\n" +
			@"    ""async"": false," + "\n" +
			@"    ""encrypt"": ""false""," + "\n" +
			@"    ""inline"": ""true""," + "\n" +
			@"    ""password"": """"," + "\n" +
			@"    ""profiles"": """"" + "\n" +
			@"} ";
			request.AddParameter("application/json", body,  ParameterType.RequestBody);
			IRestResponse response = client.Execute(request);
			Console.WriteLine(response.Content);
		}
	}
}

Python
import requests
import json

url = "https://api.pdf.co/v1/pdf/classifier"

payload = json.dumps({
 "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
 "rulescsv": "Amazon,Amazon Web Services Invoice|Amazon CloudFront\nDigital Ocean,DigitalOcean|DOInvoice\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099",
 "caseSensitive": "true",
 "async": False,
 "encrypt": "false",
 "inline": "true",
 "password": "",
 "profiles": ""
})
headers = {
	'Content-Type': 'application/json',
	'x-api-key': ''
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Powershell
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")
$headers.Add("x-api-key", "")

$body = "{`n    `"url`": `"https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf`",`n    `"rulescsv`": `"Amazon,Amazon Web Services Invoice|Amazon CloudFront`\nDigital Ocean,DigitalOcean|DOInvoice`\nAcme,ACME Inc.|1540 Long Street, Jacksonville, 32099`",`n    `"caseSensitive`": `"true`",`n    `"async`": false,`n    `"encrypt`": `"false`",`n    `"inline`": `"true`",`n    `"password`": `"`",`n    `"profiles`": `"`"`n} "

$response = Invoke-RestMethod 'https://api.pdf.co/v1/pdf/classifier' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json