Link Search Menu Expand Document

Document Parser

Related Knowledgebase-Explore Samples

Document Parser can automatically parse PDF, JPG, PNG document to extract fields, tables, values, barcodes from invoices, statements, orders and other PDF and scanned documents.

Built-in document parser templates:

  • General Invoice Template can parse invoices (English only) to invoice id, invoice date, extract total, tax, line items. Set the templateId parameter to 1 to use this template.

How to classify incoming documents before parsing them?

Use /pdf/classifier endpoint (see below) to automatically sort / detect the class of the document based on AI or on custom keywords based rules.

For example, you can easily define rules to find which vendor provided the document to find which template to apply accordingly. See Document Classifier for more details.

Additional Information and Tools

Available Methods

[POST] /pdf/documentparser (output as JSON)

Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!

Tools and Guides:

See Also

Parameters

  • url required. URL to the source file. Supports links from Google Drive, Dropbox, built-in PDF.co files storage, or any publicly-accessible files. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strong AES encryption with your own keys). Click here to learn more.
  • httpusername (optional) - http auth user name if required to access source url.
  • httppassword (optional) - http auth password if required to access source url.
  • templateId. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
  • template. optional. You can pass code of document parser template to be used directly.
  • inline. optional. Set to true to return results inside the response. Otherwise endpoint will return a link to output file generated.
  • outputFormat. optional. Default is JSON. You can override default output format to CSV or XML to generate CSV or XML output accordingly.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • profiles optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.

  • Method: POST
  • URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
}

Example responses

/pdf/documentparser (output as JSON)
{
    "body": {
        "objects": [
            {
                "name": "companyName",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "companyName2",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "invoiceId",
                "objectType": "field",
                "value": "123456789",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateIssued",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateDue",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "bankAccount",
                "objectType": "field",
                "value": "123456789012",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "total",
                "objectType": "field",
                "value": 6.58,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "subTotal",
                "objectType": "field",
                "value": ""
            },
            {
                "name": "tax",
                "objectType": "field",
                "value": 1.01,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "objectType": "table",
                "name": "table",
                "rows": []
            }
        ],
        "templateName": "Generic Invoice [en]",
        "templateVersion": "4",
        "timestamp": "2020-08-21T19:23:31"
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.json",
    "remainingCredits": 60803
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "outputFormat": "JSON",
    "templateId": "1",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
}'

[POST] /pdf/documentparser (output as XML)

Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!

Tools and Guides:

See Also

Parameters

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strong AES encryption with your own keys). Click here to learn more.
  • httpusername (optional) - http auth user name if required to access source url.
  • httppassword (optional) - http auth password if required to access source url.
  • templateId. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
  • template. optional. You can pass code of document parser template to be used directly.
  • inline. optional. Set to true to return results inside the response. Otherwise endpoint will return a link to output file generated.
  • outputFormat. optional. Default is JSON. You can override default output format to CSV or XML to generate CSV or XML output accordingly.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • profiles optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.

  • Method: POST
  • URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "outputFormat": "XML",
    "templateId": "1",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
}

Example responses

/pdf/documentparser (output as XML)
{
    "body": "<?xml version=\"1.0\" encoding=\"utf-16\"?>\r\n<parsingResult>\r\n  <objects>\r\n    <name>companyName</name>\r\n    <objectType>field</objectType>\r\n    <value>ACME Inc.</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>companyName2</name>\r\n    <objectType>field</objectType>\r\n    <value>Lanny Lane Ltd.</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>invoiceId</name>\r\n    <objectType>field</objectType>\r\n    <value>67893566</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>dateIssued</name>\r\n    <objectType>field</objectType>\r\n    <value>2019-01-05T00:00:00</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>dateDue</name>\r\n    <objectType>field</objectType>\r\n    <value>2019-01-05T00:00:00</value>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>bankAccount</name>\r\n    <objectType>field</objectType>\r\n    <value>\r\n    </value>\r\n  </objects>\r\n  <objects>\r\n    <name>total</name>\r\n    <objectType>field</objectType>\r\n    <value>1272.35</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>subTotal</name>\r\n    <objectType>field</objectType>\r\n    <value>1262.35</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <name>tax</name>\r\n    <objectType>field</objectType>\r\n    <value>10</value>\r\n    <pageIndex>0</pageIndex>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n    <rectangle>0</rectangle>\r\n  </objects>\r\n  <objects>\r\n    <objectType>table</objectType>\r\n    <name>table</name>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>2</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 1</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>9.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.90</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>5</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 2</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>20.00</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>100.00</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>1</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 3</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>19.95</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>1</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 4</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>123.00</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>123.00</value>\r\n      </column4>\r\n    </rows>\r\n    <rows>\r\n      <column1>\r\n        <pageIndex>0</pageIndex>\r\n        <value>10</value>\r\n      </column1>\r\n      <column2>\r\n        <pageIndex>0</pageIndex>\r\n        <value>Item 5</value>\r\n      </column2>\r\n      <column3>\r\n        <pageIndex>0</pageIndex>\r\n        <value>99.95</value>\r\n      </column3>\r\n      <column4>\r\n        <pageIndex>0</pageIndex>\r\n        <value>999.50</value>\r\n      </column4>\r\n    </rows>\r\n  </objects>\r\n  <elapsed>0.320434</elapsed>\r\n  <templateName>Generic Invoice [en]</templateName>\r\n  <templateVersion>4</templateVersion>\r\n  <timestamp>2021-12-31T14:54:31</timestamp>\r\n</parsingResult>\r\n",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.xml",
    "remainingCredits": 99046120,
    "credits": 42
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "outputFormat": "XML",
    "templateId": "1",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": ""
}'

[POST] /pdf/documentparser (output as CSV)

Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!

Tools and Guides:

See Also

Parameters:

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strong AES encryption with your own keys). Click here to learn more.
  • httpusername (optional) - http auth user name if required to access source url.
  • httppassword (optional) - http auth password if required to access source url.
  • templateId. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
  • template. optional. You can pass code of document parser template to be used directly.
  • inline. optional. Set to true to return results inside the response. Otherwise endpoint will return a link to output file generated.
  • outputFormat. optional. Default is JSON. You can override default output format to CSV or XML to generate CSV or XML output accordingly.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • profiles optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.

  • Method: POST
  • URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "templateId": "1",
    "outputFormat": "CSV",
    "generateCsvHeaders": true,

    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": ""

}

Example responses

/pdf/documentparser (output as CSV)
{
    "body": "companyName,companyName2,invoiceId,dateIssued,dateDue,bankAccount,total,subTotal,tax,tableNames,tables\r\n\"Amazon Web Services, Inc\",\"Amazon Web Services, Inc\",123456789,2018-04-03T00:00:00,2018-04-03T00:00:00,123456789012,6.58,,1.01,table,\r\n\r\n",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.csv",
    "remainingCredits": 60804
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
    "templateId": "1",
    "outputFormat": "CSV",
    "generateCsvHeaders": true,

    "async": false,
    "encrypt": "false",
    "inline": "true",
    "password": ""

}'

[POST] /pdf/documentparser (output as JSON, custom template code)

Description: Parses and gets data from documents using previously prepared custom data extraction templates. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!

Tools and Guides:

See Also

Parameters

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strong AES encryption with your own keys). Click here to learn more.
  • httpusername (optional) - http auth user name if required to access source url.
  • httppassword (optional) - http auth password if required to access source url.
  • templateId. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser
  • template. optional. You can pass code of document parser template to be used directly.
  • inline. optional. Set to true to return results inside the response. Otherwise endpoint will return a link to output file generated.
  • outputFormat. optional. Default is JSON. You can override default output format to CSV or XML to generate CSV or XML output accordingly.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • profiles optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.

  • Method: POST
  • URL: /v1/pdf/documentparser

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf",
    "template": "{\r\n  \"templateVersion\": 3,\r\n  \"templatePriority\": 0,\r\n  \"sourceId\": \"Multipage Table Test\",\r\n  \"detectionRules\": {\r\n    \"keywords\": [\r\n      \"Sample document with multi-page table\"\r\n    ]\r\n  },\r\n  \"fields\": {\r\n    \"total\": {\r\n      \"type\": \"regex\",\r\n      \"expression\": \"TOTAL \",\r\n      \"dataType\": \"decimal\"\r\n    }\r\n  },\r\n  \"tables\": [\r\n    {\r\n      \"name\": \"table1\",\r\n      \"start\": {\r\n        \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n      },\r\n      \"end\": {\r\n        \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n      },\r\n      \"row\": {\r\n        \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n      },\r\n      \"columns\": [\r\n        {\r\n          \"name\": \"itemNo\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"description\",\r\n          \"type\": \"string\"\r\n        },\r\n        {\r\n          \"name\": \"price\",\r\n          \"type\": \"decimal\"\r\n        },\r\n        {\r\n          \"name\": \"qty\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"extPrice\",\r\n          \"type\": \"decimal\"\r\n        }\r\n      ],\r\n      \"multipage\": true\r\n    }\r\n  ]\r\n}",
    "outputFormat": "JSON",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "profiles": "",
    "password": ""
}

Example responses

POST /pdf/documentparser
{
    "body": {
        "objects": [
            {
                "name": "companyName",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "companyName2",
                "objectType": "field",
                "value": "Amazon Web Services, Inc",
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "invoiceId",
                "objectType": "field",
                "value": "123456789",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateIssued",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "dateDue",
                "objectType": "field",
                "value": "2018-04-03T00:00:00",
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "total",
                "objectType": "field",
                "value": 6.58,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "name": "subTotal",
                "objectType": "field",
                "value": ""
            },
            {
                "name": "tax",
                "objectType": "field",
                "value": 1.01,
                "pageIndex": 0,
                "rectangle": [
                    0,
                    0,
                    0,
                    0
                ]
            },
            {
                "objectType": "table",
                "name": "table",
                "rows": []
            }
        ],
        "templateName": "Generic Invoice [en]",
        "templateVersion": "4",
        "timestamp": "2020-07-16T22:04:25"
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample-invoice.json",
    "remainingCredits": 77731
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf",
    "template": "{\r\n  \"templateVersion\": 3,\r\n  \"templatePriority\": 0,\r\n  \"sourceId\": \"Multipage Table Test\",\r\n  \"detectionRules\": {\r\n    \"keywords\": [\r\n      \"Sample document with multi-page table\"\r\n    ]\r\n  },\r\n  \"fields\": {\r\n    \"total\": {\r\n      \"type\": \"regex\",\r\n      \"expression\": \"TOTAL \",\r\n      \"dataType\": \"decimal\"\r\n    }\r\n  },\r\n  \"tables\": [\r\n    {\r\n      \"name\": \"table1\",\r\n      \"start\": {\r\n        \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n      },\r\n      \"end\": {\r\n        \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n      },\r\n      \"row\": {\r\n        \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n      },\r\n      \"columns\": [\r\n        {\r\n          \"name\": \"itemNo\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"description\",\r\n          \"type\": \"string\"\r\n        },\r\n        {\r\n          \"name\": \"price\",\r\n          \"type\": \"decimal\"\r\n        },\r\n        {\r\n          \"name\": \"qty\",\r\n          \"type\": \"integer\"\r\n        },\r\n        {\r\n          \"name\": \"extPrice\",\r\n          \"type\": \"decimal\"\r\n        }\r\n      ],\r\n      \"multipage\": true\r\n    }\r\n  ]\r\n}",
    "outputFormat": "JSON",
    "async": false,
    "encrypt": "false",
    "inline": "true",
    "profiles": "",
    "password": ""
}'

[GET] /pdf/documentparser/templates

Return all Document Parser data extraction templates for the current user. Please use GET request.

Manage your Document Parser templates at https://app.pdf.co/document-parser/templates

  • Method: GET
  • URL: /v1/pdf/documentparser/templates

Query parameters

No query parameters accepted.

Body payload

No body parameters accepted.

Example responses

pdf/documentparser/templates
{
    "templates": [
        {
            "id": 40,
            "type": "user",
            "title": "Untitled",
            "description": "Untitled"
        },
        {
            "id": 1,
            "type": "system",
            "title": "Invoice Parser",
            "description": "Parses invoices and extracts invoice number, company name, due date, amount, tax"
        }
    ],
    "remainingCredits": 94229
}

Code Snippet

CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '

[GET] /pdf/documentparser/templates/:id

Returns detailed information for document parser template by template’s id. Please use GET request.

Manage your Document Parser templates at https://app.pdf.co/document-parser/templates

  • Method: GET
  • URL: /v1/pdf/documentparser/templates/:id

Query parameters

No query parameters accepted.

Body payload

No body parameters accepted.

Example responses

No example responses saved.

Code Snippet

CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates/1' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw ''

Knowledgebase

Samples