Document Parser

Explore Samples

Document Parser can automatically parse PDF, JPG, and PNG documents to extract fields, tables, values, and barcodes from invoices, statements, orders, and other PDF and scanned documents.

Built-in document parser templates:

General Invoice Template can parse invoices (English only) to invoice id, invoice date, extract total, tax, and line items. Set the templateId parameter to 1 to use this template.

How to classify incoming documents before parsing them?

Use the /pdf/classifier endpoint (see below) to automatically sort/detect the class of the document based on AI or on custom keywords-based rules.

For example, you can easily define rules to find which vendor provided the document to find which template to apply accordingly. See Document Classifier for more details.

Additional Information and Tools

Document Parser Template Editor (or check a standalone version here)
Document Parser Template Objects Guide

Available Methods

[POST] /pdf/documentparser (output as JSON)
[POST] /pdf/documentparser (output as XML)
[POST] /pdf/documentparser (output as CSV)
[POST] /pdf/documentparser (output as JSON, custom template code)
[GET] /pdf/documentparser/templates
[GET] /pdf/documentparser/templates/:id

[POST] /pdf/documentparser (output as JSON)

Description: This API method extracts data from documents based on a document parser extraction template. With this API method, you can extract data from custom areas by searching form fields, tables, multiple pages, and more.

Tools and Guides:

See Also

Attributes

Hint: attributes should be inside JSON for POST request:

{
    "url": "url-input-link"
}

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url.`
httppassword optional HTTP auth password if required to access source `url`.
templateId required Set ID of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser.
template optional You can pass the code of the document parser template to be used directly.
inline optional Set to `true` to return results inside the response. Otherwise, the endpoint will return a link to the output file generated.
outputFormat optional Default is `JSON`. You can override the default output format to `CSV` or `XML` to generate CSV or XML output accordingly.
password optional Password of PDF file, The input must be in string format.
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish.
name optional File name for the generated output, The input must be in string format.
expiration optional Set the expiration time for the output link in minutes (`default is 60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format.

Method: POST
URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}

Example responses

/pdf/documentparser (output as JSON)

{
    "body": {
        "objects": [
            {
                "name": "companyName",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "companyName2",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "invoiceId",
                "objectType": "field",
                "value": "123456789",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateIssued",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateDue",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "bankAccount",
                "objectType": "field",
                "value": "123456789012",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "total",
                "objectType": "field",
                "value": 6.58,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "subTotal",
                "objectType": "field",
                "value": ""
            },
            {
                "name": "tax",
                "objectType": "field",
                "value": 1.01,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "objectType": "table",
                "name": "table",
                "rows": []
            }
        ],
        "templateName": "Generic Invoice [en]",
        "templateVersion": "4",
        "timestamp": "2020-08-21T19:23:31"
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.json",
    "remainingCredits": 60803
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}'

[POST] /pdf/documentparser (output as XML)

Description: Extracts data from pdf and scanned documents using a data extraction template (called Document Parser Template). With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!

Tools and Guides:

See Also

Attributes

Hint: attributes should be inside JSON for POST request:

{
    "url": "url-input-link"
}

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url.`
httppassword optional HTTP auth password if required to access source `url`.
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
template optional You can pass the code of the document parser template to be used directly.
inline optional Set to `true` to return results inside the response. Otherwise, the endpoint will return a link to the output file generated.
outputFormat optional Default is `JSON`. You can override the default output format to `CSV` or `XML` to generate CSV or XML output accordingly.
password optional Password of PDF file. The input must be in string format.
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish.
name optional File name for the generated output, The input must be in string format.
expiration optional Set the expiration time for the output link in minutes (`default is 60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format.

Method: POST
URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "XML",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}

Example responses

/pdf/documentparser (output as XML)

{
    "body": "<?xml version=\"1.0\" encoding=\"utf-16\"?>\r\n<parsingResult>\r\n  <objects>\r\n    <name>companyName</name>\r\n    <objectType>field</objectType>\r\n    <value>ACME Inc.</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>companyName2</name>\r\n    <objectType>field</objectType>\r\n    <value>Lanny Lane Ltd.</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>invoiceId</name>\r\n    <objectType>field</objectType>\r\n    <value>67893566</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>dateIssued</name>\r\n    <objectType>field</objectType>\r\n    <value>2019-01-05T00:00:00</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>dateDue</name>\r\n    <objectType>field</objectType>\r\n    <value>2019-01-05T00:00:00</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>bankAccount</name>\r\n    <objectType>field</objectType>\r\n    <value>\r\n    </value>\r\n  </objects>\r\n  <objects>\r\n    <name>total</name>\r\n    <objectType>field</objectType>\r\n    <value>1272.35</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>subTotal</name>\r\n    <objectType>field</objectType>\r\n    <value>1262.35</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>tax</name>\r\n    <objectType>field</objectType>\r\n    <value>10</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <objectType>table</objectType>\r\n    <name>table</name>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>2</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 1</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>9.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.90</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>5</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 2</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>20.00</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>100.00</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>1</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 3</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.95</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>1</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 4</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>123.00</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>123.00</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>10</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 5</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>99.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>999.50</value>\r\n      </column4>\r\n    </rows>\r\n  </objects>\r\n  <elapsed>0.320434</elapsed>\r\n  <templateName>Generic Invoice [en]</templateName>\r\n  <templateVersion>4</templateVersion>\r\n  <timestamp>2021-12-31T14:54:31</timestamp>\r\n</parsingResult>\r\n",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.xml",
    "remainingCredits": 99046120,
    "credits": 42
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "outputFormat": "XML",
    "templateId": "1",
    "async": false,
    "inline": "true",
    "password": "",
    "profiles": ""
}'

[POST] /pdf/documentparser (output as CSV)

Description: Gets data from documents using a data extraction template. With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!

Tools and Guides:

See Also

Attributes

Hint: attributes should be inside JSON for POST request:

{
    "url": "url-input-link"
}

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url`
httppassword optional HTTP auth password if required to access source `url`.
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
template optional You can pass the code of the document parser template to be used directly.
inline optional Set to `true` to return results inside the response. Otherwise, the endpoint will return a link to the output file generated.
outputFormat optional Default is `JSON`. You can override the default output format to `CSV` or `XML` to generate CSV or XML output accordingly.
password optional Password of PDF file, The input must be in string format.
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish.
name optional File name for generated output, The input must be in string format.
expiration optional Set the expiration time for the output link in minutes (`default is 60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format.

Method: POST
URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "templateId": "1",
    "outputFormat": "CSV",
    "generateCsvHeaders": true,

    "async": false,
    "inline": "true",
    "password": ""

}

Example responses

/pdf/documentparser (output as CSV)

{
    "body": "companyName,companyName2,invoiceId,dateIssued,dateDue,bankAccount,total,subTotal,tax,tableNames,tables\r\n\"Amazon Web Services, Inc\",\"Amazon Web Services, Inc\",123456789,2018-04-03T00:00:00,2018-04-03T00:00:00,123456789012,6.58,,1.01,table,\r\n\r\n",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.csv",
    "remainingCredits": 60804
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
    "templateId": "1",
    "outputFormat": "CSV",
    "generateCsvHeaders": true,

    "async": false,
    "inline": "true",
    "password": ""

}'

[POST] /pdf/documentparser (output as JSON, custom template code)

Description: Parses and gets data from documents using previously prepared custom data extraction templates. With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!

Tools and Guides:

See Also

Attributes

Hint: attributes should be inside JSON for POST request:

{
    "url": "url-input-link"
}

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url`
httppassword optional HTTP auth password if required to access source `url`.
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
template optional You can pass the code of the document parser template to be used directly.
inline optional Set to `true` to return results inside the response. Otherwise, endpoint will return a link to the output file generated.
outputFormat optional Default is `JSON`. You can override the default output format to `CSV` or `XML` to generate CSV or XML output accordingly.
password optional Password of PDF file. Must be a String
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish.
name optional File name for generated output, The input must be in string format.
expiration optional Set the expiration time for the output link in minutes (`default is 60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format.

Method: POST
URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/MultiPageTable.pdf",
    "template": "{\r\n  \"templateVersion\": 3,\r\n  \"templatePriority\": 0,\r\n  \"sourceId\": \"Multipage Table Test\",\r\n  \"detectionRules\": {\r\n    \"keywords\": [\r\n      \"Sample document with multi-page table\"\r\n    ]\r\n  },\r\n  \"fields\": {\r\n    \"total\": {\r\n      \"type\": \"regex\",\r\n      \"expression\": \"TOTAL \",\r\n      \"dataType\": \"decimal\"\r\n    }\r\n  },\r\n  \"tables\": [\r\n    {\r\n      \"name\": \"table1\",\r\n      \"start\": {\r\n        \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n      },\r\n      \"end\": {\r\n        \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n      },\r\n      \"row\": {\r\n        \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n      },\r\n      \"columns\": [\r\n        {\r\n          \"name\": \"itemNo\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"description\",\r\n          \"type\": \"string\"\r\n        },\r\n        {\r\n          \"name\": \"price\",\r\n          \"type\": \"decimal\"\r\n        },\r\n        {\r\n          \"name\": \"qty\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"extPrice\",\r\n          \"type\": \"decimal\"\r\n        }\r\n      ],\r\n      \"multipage\": true\r\n    }\r\n  ]\r\n}",
    "outputFormat": "JSON",
    "async": false,
    "inline": "true",
    "profiles": "",
    "password": ""
}

Example responses

POST /pdf/documentparser

{
    "body": {
        "objects": [
            {
                "name": "companyName",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "companyName2",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "invoiceId",
                "objectType": "field",
                "value": "123456789",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateIssued",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateDue",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "total",
                "objectType": "field",
                "value": 6.58,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "subTotal",
                "objectType": "field",
                "value": ""
            },
            {
                "name": "tax",
                "objectType": "field",
                "value": 1.01,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "objectType": "table",
                "name": "table",
                "rows": []
            }
        ],
        "templateName": "Generic Invoice [en]",
        "templateVersion": "4",
        "timestamp": "2020-07-16T22:04:25"
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.json",
    "remainingCredits": 77731
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/MultiPageTable.pdf",
    "template": "{\r\n  \"templateVersion\": 3,\r\n  \"templatePriority\": 0,\r\n  \"sourceId\": \"Multipage Table Test\",\r\n  \"detectionRules\": {\r\n    \"keywords\": [\r\n      \"Sample document with multi-page table\"\r\n    ]\r\n  },\r\n  \"fields\": {\r\n    \"total\": {\r\n      \"type\": \"regex\",\r\n      \"expression\": \"TOTAL \",\r\n      \"dataType\": \"decimal\"\r\n    }\r\n  },\r\n  \"tables\": [\r\n    {\r\n      \"name\": \"table1\",\r\n      \"start\": {\r\n        \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n      },\r\n      \"end\": {\r\n        \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n      },\r\n      \"row\": {\r\n        \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n      },\r\n      \"columns\": [\r\n        {\r\n          \"name\": \"itemNo\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"description\",\r\n          \"type\": \"string\"\r\n        },\r\n        {\r\n          \"name\": \"price\",\r\n          \"type\": \"decimal\"\r\n        },\r\n        {\r\n          \"name\": \"qty\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"extPrice\",\r\n          \"type\": \"decimal\"\r\n        }\r\n      ],\r\n      \"multipage\": true\r\n    }\r\n  ]\r\n}",
    "outputFormat": "JSON",
    "async": false,
    "inline": "true",
    "profiles": "",
    "password": ""
}'

[GET] /pdf/documentparser/templates

Return all Document Parser data extraction templates for the current user. Please use the GET request.

Manage your Document Parser templates at https://app.pdf.co/document-parser/templates

Method: GET
URL: /v1/pdf/documentparser/templates

Query parameters

No query parameters accepted.

Body payload

No body parameters accepted.

Example responses

pdf/documentparser/templates

{
    "templates": [
        {
            "id": 40,
            "type": "user",
            "title": "Untitled",
            "description": "Untitled"
        },
        {
            "id": 1,
            "type": "system",
            "title": "Invoice Parser",
            "description": "Parses invoices and extracts invoice number, company name, due date, amount, tax"
        }
    ],
    "remainingCredits": 94229
}

Code Snippet

CURL

curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '

[GET] /pdf/documentparser/templates/:id

Returns detailed information for document parser template by template’s id. Please use the GET request.

Manage your Document Parser templates at https://app.pdf.co/document-parser/templates

Method: GET
URL: /v1/pdf/documentparser/templates/:id

Query parameters

No query parameters accepted.

Body payload

No body parameters accepted.

Example responses

No example responses saved.

Code Snippet

CURL

curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates/1' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw ''