Document Parser
Related Knowledgebase-Explore Samples
Document Parser can automatically parse PDF, JPG, PNG document to extract fields, tables, values, barcodes from invoices, statements, orders and other PDF and scanned documents.
Built-in document parser templates:
General Invoice Template
can parse invoices (English only) to invoice id, invoice date, extract total, tax, line items. Set thetemplateId
parameter to1
to use this template.
How to classify incoming documents before parsing them?
Use /pdf/classifier
endpoint (see below) to automatically sort / detect the class of the document based on AI or on custom keywords based rules.
For example, you can easily define rules to find which vendor provided the document to find which template to apply accordingly. See Document Classifier for more details.
Additional Information and Tools
- Document Parser Template Editor (or check a standalone version here)
- Document Parser Template Objects Guide
Available Methods
- [POST] /pdf/documentparser (output as JSON)
- [POST] /pdf/documentparser (output as XML)
- [POST] /pdf/documentparser (output as CSV)
- [POST] /pdf/documentparser (output as JSON, custom template code)
- [GET] /pdf/documentparser/templates
- [GET] /pdf/documentparser/templates/:id
[POST] /pdf/documentparser (output as JSON)
Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!
Tools and Guides:
- Document Parser Template Editor (online version)
- Document Parser Template Coding Guide
- Document Parser Template Editor (offline desktop version)
See Also
Parameters
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.templateId
. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parsertemplate
. optional. You can pass code of document parser template to be used directly.inline
. optional. Set totrue
to return results inside the response. Otherwise endpoint will return a link to output file generated.outputFormat
. optional. Default isJSON
. You can override default output format toCSV
orXML
to generate CSV or XML output accordingly.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.profiles
optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"outputFormat": "JSON",
"templateId": "1",
"async": false,
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": ""
}
Example responses
/pdf/documentparser (output as JSON)
{
"body": {
"objects": [
{
"name": "companyName",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "companyName2",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "invoiceId",
"objectType": "field",
"value": "123456789",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateIssued",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateDue",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "bankAccount",
"objectType": "field",
"value": "123456789012",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "total",
"objectType": "field",
"value": 6.58,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "subTotal",
"objectType": "field",
"value": ""
},
{
"name": "tax",
"objectType": "field",
"value": 1.01,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"objectType": "table",
"name": "table",
"rows": []
}
],
"templateName": "Generic Invoice [en]",
"templateVersion": "4",
"timestamp": "2020-08-21T19:23:31"
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.json",
"remainingCredits": 60803
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"outputFormat": "JSON",
"templateId": "1",
"async": false,
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": ""
}'
[POST] /pdf/documentparser (output as XML)
Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!
Tools and Guides:
- Document Parser Template Editor (online version)
- Document Parser Template Coding Guide
- Document Parser Template Editor (offline desktop version)
See Also
Parameters
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.templateId
. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parsertemplate
. optional. You can pass code of document parser template to be used directly.inline
. optional. Set totrue
to return results inside the response. Otherwise endpoint will return a link to output file generated.outputFormat
. optional. Default isJSON
. You can override default output format toCSV
orXML
to generate CSV or XML output accordingly.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.profiles
optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"outputFormat": "XML",
"templateId": "1",
"async": false,
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": ""
}
Example responses
/pdf/documentparser (output as XML)
{
"body": "<?xml version=\"1.0\" encoding=\"utf-16\"?>\r\n<parsingResult>\r\n <objects>\r\n <name>companyName</name>\r\n <objectType>field</objectType>\r\n <value>ACME Inc.</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>companyName2</name>\r\n <objectType>field</objectType>\r\n <value>Lanny Lane Ltd.</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>invoiceId</name>\r\n <objectType>field</objectType>\r\n <value>67893566</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>dateIssued</name>\r\n <objectType>field</objectType>\r\n <value>2019-01-05T00:00:00</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>dateDue</name>\r\n <objectType>field</objectType>\r\n <value>2019-01-05T00:00:00</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>bankAccount</name>\r\n <objectType>field</objectType>\r\n <value>\r\n </value>\r\n </objects>\r\n <objects>\r\n <name>total</name>\r\n <objectType>field</objectType>\r\n <value>1272.35</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>subTotal</name>\r\n <objectType>field</objectType>\r\n <value>1262.35</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>tax</name>\r\n <objectType>field</objectType>\r\n <value>10</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <objectType>table</objectType>\r\n <name>table</name>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>2</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 1</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>9.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>19.90</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>5</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 2</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>20.00</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>100.00</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>1</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 3</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>19.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>19.95</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>1</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 4</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>123.00</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>123.00</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>10</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 5</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>99.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>999.50</value>\r\n </column4>\r\n </rows>\r\n </objects>\r\n <elapsed>0.320434</elapsed>\r\n <templateName>Generic Invoice [en]</templateName>\r\n <templateVersion>4</templateVersion>\r\n <timestamp>2021-12-31T14:54:31</timestamp>\r\n</parsingResult>\r\n",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.xml",
"remainingCredits": 99046120,
"credits": 42
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"outputFormat": "XML",
"templateId": "1",
"async": false,
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": ""
}'
[POST] /pdf/documentparser (output as CSV)
Description: Gets data from documents using a data extraction template. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!
Tools and Guides:
- Document Parser Template Editor (online version)
- Document Parser Template Coding Guide
- Document Parser Template Editor (offline desktop version)
See Also
Parameters:
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.templateId
. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parsertemplate
. optional. You can pass code of document parser template to be used directly.inline
. optional. Set totrue
to return results inside the response. Otherwise endpoint will return a link to output file generated.outputFormat
. optional. Default isJSON
. You can override default output format toCSV
orXML
to generate CSV or XML output accordingly.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.profiles
optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"templateId": "1",
"outputFormat": "CSV",
"generateCsvHeaders": true,
"async": false,
"encrypt": "false",
"inline": "true",
"password": ""
}
Example responses
/pdf/documentparser (output as CSV)
{
"body": "companyName,companyName2,invoiceId,dateIssued,dateDue,bankAccount,total,subTotal,tax,tableNames,tables\r\n\"Amazon Web Services, Inc\",\"Amazon Web Services, Inc\",123456789,2018-04-03T00:00:00,2018-04-03T00:00:00,123456789012,6.58,,1.01,table,\r\n\r\n",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.csv",
"remainingCredits": 60804
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf",
"templateId": "1",
"outputFormat": "CSV",
"generateCsvHeaders": true,
"async": false,
"encrypt": "false",
"inline": "true",
"password": ""
}'
[POST] /pdf/documentparser (output as JSON, custom template code)
Description: Parses and gets data from documents using previously prepared custom data extraction templates. With this API method you may extract data from custom areas, by search, form fields, tables, multiple pages and more!
Tools and Guides:
- Document Parser Template Editor (online version)
- Document Parser Template Coding Guide
- Document Parser Template Editor (offline desktop version)
See Also
Parameters
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.templateId
. required. Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parsertemplate
. optional. You can pass code of document parser template to be used directly.inline
. optional. Set totrue
to return results inside the response. Otherwise endpoint will return a link to output file generated.outputFormat
. optional. Default isJSON
. You can override default output format toCSV
orXML
to generate CSV or XML output accordingly.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.profiles
optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf",
"template": "{\r\n \"templateVersion\": 3,\r\n \"templatePriority\": 0,\r\n \"sourceId\": \"Multipage Table Test\",\r\n \"detectionRules\": {\r\n \"keywords\": [\r\n \"Sample document with multi-page table\"\r\n ]\r\n },\r\n \"fields\": {\r\n \"total\": {\r\n \"type\": \"regex\",\r\n \"expression\": \"TOTAL \",\r\n \"dataType\": \"decimal\"\r\n }\r\n },\r\n \"tables\": [\r\n {\r\n \"name\": \"table1\",\r\n \"start\": {\r\n \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n },\r\n \"end\": {\r\n \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n },\r\n \"row\": {\r\n \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n },\r\n \"columns\": [\r\n {\r\n \"name\": \"itemNo\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"description\",\r\n \"type\": \"string\"\r\n },\r\n {\r\n \"name\": \"price\",\r\n \"type\": \"decimal\"\r\n },\r\n {\r\n \"name\": \"qty\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"extPrice\",\r\n \"type\": \"decimal\"\r\n }\r\n ],\r\n \"multipage\": true\r\n }\r\n ]\r\n}",
"outputFormat": "JSON",
"async": false,
"encrypt": "false",
"inline": "true",
"profiles": "",
"password": ""
}
Example responses
POST /pdf/documentparser
{
"body": {
"objects": [
{
"name": "companyName",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "companyName2",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "invoiceId",
"objectType": "field",
"value": "123456789",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateIssued",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateDue",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "total",
"objectType": "field",
"value": 6.58,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "subTotal",
"objectType": "field",
"value": ""
},
{
"name": "tax",
"objectType": "field",
"value": 1.01,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"objectType": "table",
"name": "table",
"rows": []
}
],
"templateName": "Generic Invoice [en]",
"templateVersion": "4",
"timestamp": "2020-07-16T22:04:25"
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.json",
"remainingCredits": 77731
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/MultiPageTable.pdf",
"template": "{\r\n \"templateVersion\": 3,\r\n \"templatePriority\": 0,\r\n \"sourceId\": \"Multipage Table Test\",\r\n \"detectionRules\": {\r\n \"keywords\": [\r\n \"Sample document with multi-page table\"\r\n ]\r\n },\r\n \"fields\": {\r\n \"total\": {\r\n \"type\": \"regex\",\r\n \"expression\": \"TOTAL \",\r\n \"dataType\": \"decimal\"\r\n }\r\n },\r\n \"tables\": [\r\n {\r\n \"name\": \"table1\",\r\n \"start\": {\r\n \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n },\r\n \"end\": {\r\n \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n },\r\n \"row\": {\r\n \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n },\r\n \"columns\": [\r\n {\r\n \"name\": \"itemNo\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"description\",\r\n \"type\": \"string\"\r\n },\r\n {\r\n \"name\": \"price\",\r\n \"type\": \"decimal\"\r\n },\r\n {\r\n \"name\": \"qty\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"extPrice\",\r\n \"type\": \"decimal\"\r\n }\r\n ],\r\n \"multipage\": true\r\n }\r\n ]\r\n}",
"outputFormat": "JSON",
"async": false,
"encrypt": "false",
"inline": "true",
"profiles": "",
"password": ""
}'
[GET] /pdf/documentparser/templates
Return all Document Parser data extraction templates for the current user. Please use GET
request.
Manage your Document Parser templates at https://app.pdf.co/document-parser/templates
- Method: GET
- URL: /v1/pdf/documentparser/templates
Query parameters
No query parameters accepted.
Body payload
No body parameters accepted.
Example responses
pdf/documentparser/templates
{
"templates": [
{
"id": 40,
"type": "user",
"title": "Untitled",
"description": "Untitled"
},
{
"id": 1,
"type": "system",
"title": "Invoice Parser",
"description": "Parses invoices and extracts invoice number, company name, due date, amount, tax"
}
],
"remainingCredits": 94229
}
Code Snippet
CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '
[GET] /pdf/documentparser/templates/:id
Returns detailed information for document parser template by template’s id. Please use GET
request.
Manage your Document Parser templates at https://app.pdf.co/document-parser/templates
- Method: GET
- URL: /v1/pdf/documentparser/templates/:id
Query parameters
No query parameters accepted.
Body payload
No body parameters accepted.
Example responses
No example responses saved.
Code Snippet
CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates/1' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw ''
Knowledgebase
Samples
- C# - Blood Test Results to JSON
- C# - Census table from life and annuity quote request pdf
- C# - Create Custom Template
- C# - Extract line items from tables on multiple pages
- C# - Parse From URL
- C# - Parse From URL Asynchronously
- C# - Parse Multipage Table
- C# - Parse Simple Document
- C# - Parse Uploaded File
- C# - Parse Uploaded File Asynchronously
- C# - Parse Uploaded File Asynchronously (Using TemplateId)
- C# - Parse and Generate HL7 Output
- C# - Parse with OCR
- C# - Parsing and reading data from Airline Tickets
- GoogleAppScript - Convert PDF Invoice to Google Sheet
- Java - Blood Test Results to JSON
- Java - Create Custom Template
- Java - Extract line items from tables on multiple pages
- Java - Parse From URL Asynchronously
- Java - Parse From Url
- Java - Parse Multipage Table
- Java - Parse Simple Document
- Java - Parse Uploaded File
- Java - Parse Uploaded File Asynchronously
- Java - Parse with OCR
- Java - Parsing and reading data from Airline Tickets
- JavaScript - Parse From Url (Node.js)
- JavaScript - Parse Uploaded File (Node.js)
- PHP - Blood Test Results to JSON
- PHP - Create Custom Template
- PHP - Extract line items from tables on multiple pages
- PHP - Parse From URL Asynchronously
- PHP - Parse Invoice and Fill Database (SQL Server)
- PHP - Parse Invoice and Save Table Data to mySql Database
- PHP - Parse Multipage Table
- PHP - Parse Simple Document
- PHP - Parse Uploaded File Asynchronously
- PHP - Parse with OCR
- PHP - Parsing and reading data from Airline Tickets
- Powershell - Parse From Uploaded File
- Powershell - Parse From Url
- Python - Parse From Uploaded File
- Python - Parse From Url
- Python - Parse PDF Invoice
- Salesforce - Document Parser Demo
- Salesforce - Parse Document and Get CSV Output
- SharePoint - Parse Invoice Information
- TEMPLATES-SAMPLES - Amazon Shipment Label
- TEMPLATES-SAMPLES - Auto Detect Table
- TEMPLATES-SAMPLES - Auto Find Table and Extract Borderless Table
- TEMPLATES-SAMPLES - Bank of America Statement
- TEMPLATES-SAMPLES - Blood Test Results to JSON
- TEMPLATES-SAMPLES - Census table from life and annuity quote request pdf
- TEMPLATES-SAMPLES - Create Custom Template
- TEMPLATES-SAMPLES - Extract line items from tables on multiple pages
- TEMPLATES-SAMPLES - Invoice table with some empty columns
- TEMPLATES-SAMPLES - Invoice with few line items in EUR
- TEMPLATES-SAMPLES - Invoice with line items in bordered table
- TEMPLATES-SAMPLES - JPMorgan Chase Statement
- TEMPLATES-SAMPLES - Key Value Fields From Echocardiogram Report
- TEMPLATES-SAMPLES - ManyChat Invoice
- TEMPLATES-SAMPLES - Multiline Items Without Borders
- TEMPLATES-SAMPLES - Order form with line items and total
- TEMPLATES-SAMPLES - Parse Email Address
- TEMPLATES-SAMPLES - Parse Hanging Rows In Invoice
- TEMPLATES-SAMPLES - Parse IRS Form 1040
- TEMPLATES-SAMPLES - Parse IRS Form 1099-DIV
- TEMPLATES-SAMPLES - Parse IRS Form 1099-K
- TEMPLATES-SAMPLES - Parse IRS Form W2
- TEMPLATES-SAMPLES - Parse Multipage Table
- TEMPLATES-SAMPLES - Parse PandaDoc Sample Invoice
- TEMPLATES-SAMPLES - Parse Simple Document
- TEMPLATES-SAMPLES - Parse and Generate HL7 Output
- TEMPLATES-SAMPLES - Parse with OCR
- TEMPLATES-SAMPLES - Parsing and reading data from Airline Tickets
- TEMPLATES-SAMPLES - Read values in columns 2 and 3
- TEMPLATES-SAMPLES - Statement of Assets
- TEMPLATES-SAMPLES - Tax Invoice with Line Items
- TEMPLATES-SAMPLES - Total and Vat tax
- TEMPLATES-SAMPLES - US Postal Shipping Label
- TEMPLATES-SAMPLES - Wells Fargo Statement
- VB.NET - Blood Test Results to JSON
- VB.NET - Census table from life and annuity quote request pdf
- VB.NET - Create Custom Template
- VB.NET - Extract line items from tables on multiple pages
- VB.NET - Parse From Url
- VB.NET - Parse Multipage Table
- VB.NET - Parse Simple Document
- VB.NET - Parse Uploaded File
- VB.NET - Parse Uploaded File Asynchronously
- VB.NET - Parse with OCR
- VB.NET - Parsing and reading data from Airline Tickets
- cURL - Document Parser Custom Template Code
- cURL - Document Parser Output as CSV
- cURL - Document Parser Output as JSON
- cURL - Document Parser Results
Copyright © 2016 - 2022 PDF.co