Document Parser
Document Parser can automatically parse PDF, JPG, and PNG documents to extract fields, tables, values, and barcodes from invoices, statements, orders, and other PDF and scanned documents.
Built-in document parser templates:
General Invoice Template
can parse invoices (English only) to invoice id, invoice date, extract total, tax, and line items. Set thetemplateId
parameter to1
to use this template.
How to classify incoming documents before parsing them?
Use the /pdf/classifier
endpoint (see below) to automatically sort/detect the class of the document based on AI or on custom keywords-based rules.
For example, you can easily define rules to find which vendor provided the document to find which template to apply accordingly. See Document Classifier for more details.
Additional Information and Tools
- Document Parser Template Editor (or check a standalone version here)
- Document Parser Template Objects Guide
Available Methods
- [POST] /pdf/documentparser (output as JSON)
- [POST] /pdf/documentparser (output as XML)
- [POST] /pdf/documentparser (output as CSV)
- [POST] /pdf/documentparser (output as JSON, custom template code)
- [GET] /pdf/documentparser/templates
- [GET] /pdf/documentparser/templates/:id
[POST] /pdf/documentparser (output as JSON)
Description: This API method extracts data from documents based on a document parser extraction template. With this API method, you can extract data from custom areas by searching form fields, tables, multiple pages, and more!
Tools and Guides:
See Also
Attributes
Hint: attributes should be inside JSON for POST request:
{
"url": "url-input-link"
}
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url |
httppassword optional HTTP auth password if required to access source url . |
templateId required Set ID of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser |
template optional You can pass the code of the document parser template to be used directly. |
inline optional Set to true to return results inside the response. Otherwise, the endpoint will return a link to the output file generated. |
outputFormat optional Default is JSON . You can override the default output format to CSV or XML to generate CSV or XML output accordingly. |
password optional Password of PDF file, The input must be in string format. |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for the generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"outputFormat": "JSON",
"templateId": "1",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
}
Example responses
/pdf/documentparser (output as JSON)
{
"body": {
"objects": [
{
"name": "companyName",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "companyName2",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "invoiceId",
"objectType": "field",
"value": "123456789",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateIssued",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateDue",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "bankAccount",
"objectType": "field",
"value": "123456789012",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "total",
"objectType": "field",
"value": 6.58,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "subTotal",
"objectType": "field",
"value": ""
},
{
"name": "tax",
"objectType": "field",
"value": 1.01,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"objectType": "table",
"name": "table",
"rows": []
}
],
"templateName": "Generic Invoice [en]",
"templateVersion": "4",
"timestamp": "2020-08-21T19:23:31"
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.json",
"remainingCredits": 60803
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"outputFormat": "JSON",
"templateId": "1",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
}'
[POST] /pdf/documentparser (output as XML)
Description: Extracts data from pdf and scanned documents using a data extraction template (called Document Parser Template
). With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!
Tools and Guides:
See Also
Attributes
Hint: attributes should be inside JSON for POST request:
{
"url": "url-input-link"
}
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url |
httppassword optional HTTP auth password if required to access source url . |
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser |
template optional You can pass the code of the document parser template to be used directly. |
inline optional Set to true to return results inside the response. Otherwise, the endpoint will return a link to the output file generated. |
outputFormat optional Default is JSON . You can override the default output format to CSV or XML to generate CSV or XML output accordingly. |
password optional Password of PDF file. The input must be in string format. |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for the generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"outputFormat": "XML",
"templateId": "1",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
}
Example responses
/pdf/documentparser (output as XML)
{
"body": "<?xml version=\"1.0\" encoding=\"utf-16\"?>\r\n<parsingResult>\r\n <objects>\r\n <name>companyName</name>\r\n <objectType>field</objectType>\r\n <value>ACME Inc.</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>companyName2</name>\r\n <objectType>field</objectType>\r\n <value>Lanny Lane Ltd.</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>invoiceId</name>\r\n <objectType>field</objectType>\r\n <value>67893566</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>dateIssued</name>\r\n <objectType>field</objectType>\r\n <value>2019-01-05T00:00:00</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>dateDue</name>\r\n <objectType>field</objectType>\r\n <value>2019-01-05T00:00:00</value>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>bankAccount</name>\r\n <objectType>field</objectType>\r\n <value>\r\n </value>\r\n </objects>\r\n <objects>\r\n <name>total</name>\r\n <objectType>field</objectType>\r\n <value>1272.35</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>subTotal</name>\r\n <objectType>field</objectType>\r\n <value>1262.35</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <name>tax</name>\r\n <objectType>field</objectType>\r\n <value>10</value>\r\n <pageIndex>0</pageIndex>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n <rectangle>0</rectangle>\r\n </objects>\r\n <objects>\r\n <objectType>table</objectType>\r\n <name>table</name>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>2</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 1</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>9.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>19.90</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>5</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 2</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>20.00</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>100.00</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>1</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 3</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>19.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>19.95</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>1</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 4</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>123.00</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>123.00</value>\r\n </column4>\r\n </rows>\r\n <rows>\r\n <column1>\r\n <pageIndex>0</pageIndex>\r\n <value>10</value>\r\n </column1>\r\n <column2>\r\n <pageIndex>0</pageIndex>\r\n <value>Item 5</value>\r\n </column2>\r\n <column3>\r\n <pageIndex>0</pageIndex>\r\n <value>99.95</value>\r\n </column3>\r\n <column4>\r\n <pageIndex>0</pageIndex>\r\n <value>999.50</value>\r\n </column4>\r\n </rows>\r\n </objects>\r\n <elapsed>0.320434</elapsed>\r\n <templateName>Generic Invoice [en]</templateName>\r\n <templateVersion>4</templateVersion>\r\n <timestamp>2021-12-31T14:54:31</timestamp>\r\n</parsingResult>\r\n",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.xml",
"remainingCredits": 99046120,
"credits": 42
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"outputFormat": "XML",
"templateId": "1",
"async": false,
"inline": "true",
"password": "",
"profiles": ""
}'
[POST] /pdf/documentparser (output as CSV)
Description: Gets data from documents using a data extraction template. With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!
Tools and Guides:
See Also
Attributes
Hint: attributes should be inside JSON for POST request:
{
"url": "url-input-link"
}
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url |
httppassword optional HTTP auth password if required to access source url . |
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser |
template optional You can pass the code of the document parser template to be used directly. |
inline optional Set to true to return results inside the response. Otherwise, the endpoint will return a link to the output file generated. |
outputFormat optional Default is JSON . You can override the default output format to CSV or XML to generate CSV or XML output accordingly. |
password optional Password of PDF file, The input must be in string format. |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"templateId": "1",
"outputFormat": "CSV",
"generateCsvHeaders": true,
"async": false,
"inline": "true",
"password": ""
}
Example responses
/pdf/documentparser (output as CSV)
{
"body": "companyName,companyName2,invoiceId,dateIssued,dateDue,bankAccount,total,subTotal,tax,tableNames,tables\r\n\"Amazon Web Services, Inc\",\"Amazon Web Services, Inc\",123456789,2018-04-03T00:00:00,2018-04-03T00:00:00,123456789012,6.58,,1.01,table,\r\n\r\n",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.csv",
"remainingCredits": 60804
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/sample-invoice.pdf",
"templateId": "1",
"outputFormat": "CSV",
"generateCsvHeaders": true,
"async": false,
"inline": "true",
"password": ""
}'
[POST] /pdf/documentparser (output as JSON, custom template code)
Description: Parses and gets data from documents using previously prepared custom data extraction templates. With this API method, you may extract data from custom areas by searching, form fields, tables, multiple pages, and more!
Tools and Guides:
See Also
Attributes
Hint: attributes should be inside JSON for POST request:
{
"url": "url-input-link"
}
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url |
httppassword optional HTTP auth password if required to access source url . |
templateId required Sets Id of document parser template to be used. View and manage your templates at https://app.pdf.co/document-parser |
template optional You can pass the code of the document parser template to be used directly. |
inline optional Set to true to return results inside the response. Otherwise, endpoint will return a link to the output file generated. |
outputFormat optional Default is JSON . You can override the default output format to CSV or XML to generate CSV or XML output accordingly. |
password optional Password of PDF file. Must be a String |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
- Method: POST
- URL: /v1/pdf/documentparser
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/MultiPageTable.pdf",
"template": "{\r\n \"templateVersion\": 3,\r\n \"templatePriority\": 0,\r\n \"sourceId\": \"Multipage Table Test\",\r\n \"detectionRules\": {\r\n \"keywords\": [\r\n \"Sample document with multi-page table\"\r\n ]\r\n },\r\n \"fields\": {\r\n \"total\": {\r\n \"type\": \"regex\",\r\n \"expression\": \"TOTAL \",\r\n \"dataType\": \"decimal\"\r\n }\r\n },\r\n \"tables\": [\r\n {\r\n \"name\": \"table1\",\r\n \"start\": {\r\n \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n },\r\n \"end\": {\r\n \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n },\r\n \"row\": {\r\n \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n },\r\n \"columns\": [\r\n {\r\n \"name\": \"itemNo\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"description\",\r\n \"type\": \"string\"\r\n },\r\n {\r\n \"name\": \"price\",\r\n \"type\": \"decimal\"\r\n },\r\n {\r\n \"name\": \"qty\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"extPrice\",\r\n \"type\": \"decimal\"\r\n }\r\n ],\r\n \"multipage\": true\r\n }\r\n ]\r\n}",
"outputFormat": "JSON",
"async": false,
"inline": "true",
"profiles": "",
"password": ""
}
Example responses
POST /pdf/documentparser
{
"body": {
"objects": [
{
"name": "companyName",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "companyName2",
"objectType": "field",
"value": "Amazon Web Services, Inc",
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "invoiceId",
"objectType": "field",
"value": "123456789",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateIssued",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "dateDue",
"objectType": "field",
"value": "2018-04-03T00:00:00",
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "total",
"objectType": "field",
"value": 6.58,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"name": "subTotal",
"objectType": "field",
"value": ""
},
{
"name": "tax",
"objectType": "field",
"value": 1.01,
"pageIndex": 0,
"rectangle": [
0,
0,
0,
0
]
},
{
"objectType": "table",
"name": "table",
"rows": []
}
],
"templateName": "Generic Invoice [en]",
"templateVersion": "4",
"timestamp": "2020-07-16T22:04:25"
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample-invoice.json",
"remainingCredits": 77731
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/documentparser' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/document-parser/MultiPageTable.pdf",
"template": "{\r\n \"templateVersion\": 3,\r\n \"templatePriority\": 0,\r\n \"sourceId\": \"Multipage Table Test\",\r\n \"detectionRules\": {\r\n \"keywords\": [\r\n \"Sample document with multi-page table\"\r\n ]\r\n },\r\n \"fields\": {\r\n \"total\": {\r\n \"type\": \"regex\",\r\n \"expression\": \"TOTAL \",\r\n \"dataType\": \"decimal\"\r\n }\r\n },\r\n \"tables\": [\r\n {\r\n \"name\": \"table1\",\r\n \"start\": {\r\n \"expression\": \"Item\\\\s+Description\\\\s+Price\\\\s+Qty\\\\s+Extended Price\"\r\n },\r\n \"end\": {\r\n \"expression\": \"TOTAL\\\\s+\\\\d+\\\\.\\\\d\\\\d\"\r\n },\r\n \"row\": {\r\n \"expression\": \"^\\\\s*(?<itemNo>\\\\d+)\\\\s+(?<description>.+?)\\\\s+(?<price>\\\\d+\\\\.\\\\d\\\\d)\\\\s+(?<qty>\\\\d+)\\\\s+(?<extPrice>\\\\d+\\\\.\\\\d\\\\d)\"\r\n },\r\n \"columns\": [\r\n {\r\n \"name\": \"itemNo\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"description\",\r\n \"type\": \"string\"\r\n },\r\n {\r\n \"name\": \"price\",\r\n \"type\": \"decimal\"\r\n },\r\n {\r\n \"name\": \"qty\",\r\n \"type\": \"integer\"\r\n },\r\n {\r\n \"name\": \"extPrice\",\r\n \"type\": \"decimal\"\r\n }\r\n ],\r\n \"multipage\": true\r\n }\r\n ]\r\n}",
"outputFormat": "JSON",
"async": false,
"inline": "true",
"profiles": "",
"password": ""
}'
[GET] /pdf/documentparser/templates
Return all Document Parser data extraction templates for the current user. Please use the GET
request.
Manage your Document Parser templates at https://app.pdf.co/document-parser/templates
- Method: GET
- URL: /v1/pdf/documentparser/templates
Query parameters
No query parameters accepted.
Body payload
No body parameters accepted.
Example responses
pdf/documentparser/templates
{
"templates": [
{
"id": 40,
"type": "user",
"title": "Untitled",
"description": "Untitled"
},
{
"id": 1,
"type": "system",
"title": "Invoice Parser",
"description": "Parses invoices and extracts invoice number, company name, due date, amount, tax"
}
],
"remainingCredits": 94229
}
Code Snippet
CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '
[GET] /pdf/documentparser/templates/:id
Returns detailed information for document parser template by template’s id. Please use the GET
request.
Manage your Document Parser templates at https://app.pdf.co/document-parser/templates
- Method: GET
- URL: /v1/pdf/documentparser/templates/:id
Query parameters
No query parameters accepted.
Body payload
No body parameters accepted.
Example responses
No example responses saved.
Code Snippet
CURL
curl --location --request GET 'https://api.pdf.co/v1/pdf/documentparser/templates/1' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw ''
Samples
- C# - Blood Test Results to JSON
- C# - Census table from life and annuity quote request pdf
- C# - Create Custom Template
- C# - Extract line items from tables on multiple pages
- C# - Parse From URL
- C# - Parse From URL Asynchronously
- C# - Parse Multipage Table
- C# - Parse Simple Document
- C# - Parse Uploaded File
- C# - Parse Uploaded File Asynchronously
- C# - Parse Uploaded File Asynchronously (Using TemplateId)
- C# - Parse and Generate HL7 Output
- C# - Parse with OCR
- C# - Parsing and reading data from Airline Tickets
- GoogleAppScript - Convert PDF Invoice to Google Sheet
- Java - Blood Test Results to JSON
- Java - Create Custom Template
- Java - Extract line items from tables on multiple pages
- Java - Parse From URL Asynchronously
- Java - Parse From Url
- Java - Parse Multipage Table
- Java - Parse Simple Document
- Java - Parse Uploaded File
- Java - Parse Uploaded File Asynchronously
- Java - Parse with OCR
- Java - Parsing and reading data from Airline Tickets
- JavaScript - Parse From Url (Node.js)
- JavaScript - Parse Uploaded File (Node.js)
- PHP - Blood Test Results to JSON
- PHP - Create Custom Template
- PHP - Extract line items from tables on multiple pages
- PHP - Parse From URL Asynchronously
- PHP - Parse Invoice and Fill Database (SQL Server)
- PHP - Parse Invoice and Save Table Data to mySql Database
- PHP - Parse Multipage Table
- PHP - Parse Simple Document
- PHP - Parse Uploaded File Asynchronously
- PHP - Parse with OCR
- PHP - Parsing and reading data from Airline Tickets
- Powershell - Parse From Uploaded File
- Powershell - Parse From Url
- Python - Parse From Uploaded File
- Python - Parse From Url
- Python - Parse PDF Invoice
- Salesforce - Document Parser Demo
- Salesforce - Parse Document and Get CSV Output
- SharePoint - Parse Invoice Information
- TEMPLATES-SAMPLES - Form IRS Form 1040
- TEMPLATES-SAMPLES - Form IRS Form 1099-DIV
- TEMPLATES-SAMPLES - Form IRS Form 1099-K
- TEMPLATES-SAMPLES - Form IRS Form W2
- TEMPLATES-SAMPLES - Invoice Get Email Address
- TEMPLATES-SAMPLES - Invoice Get Total And Tax
- TEMPLATES-SAMPLES - Invoice Simple Invoice
- TEMPLATES-SAMPLES - Invoice from Amazon AWS
- TEMPLATES-SAMPLES - Invoice from Digial Ocean Scanned
- TEMPLATES-SAMPLES - Invoice from Digital Ocean
- TEMPLATES-SAMPLES - Invoice from Google
- TEMPLATES-SAMPLES - Invoice from ManyChat
- TEMPLATES-SAMPLES - Invoice from PandaDoc
- TEMPLATES-SAMPLES - Invoice table with empty columns
- TEMPLATES-SAMPLES - Invoice with Hanging Rows
- TEMPLATES-SAMPLES - Invoice with Tax and Line Items
- TEMPLATES-SAMPLES - Invoice with line items in EUR
- TEMPLATES-SAMPLES - Invoice with line items in bordered table
- TEMPLATES-SAMPLES - Order form with line items and total
- TEMPLATES-SAMPLES - Report - Blood Test Results
- TEMPLATES-SAMPLES - Report Echocardiogram - Key Value Fields
- TEMPLATES-SAMPLES - Report HL7
- TEMPLATES-SAMPLES - Shipment Label from Amazon
- TEMPLATES-SAMPLES - Shipping Label from USPS
- TEMPLATES-SAMPLES - Statement from Bank of America
- TEMPLATES-SAMPLES - Statement from JPMorgan Chase
- TEMPLATES-SAMPLES - Statement from Wells Fargo
- TEMPLATES-SAMPLES - Statement of Assets
- TEMPLATES-SAMPLES - Table Auto Detection
- TEMPLATES-SAMPLES - Table Multiline Items Without Borders
- TEMPLATES-SAMPLES - Table Multiple Pages - Approach 2 - Define Column Coordinates
- TEMPLATES-SAMPLES - Table Multiple pages - Approach 1 - Detect Columns Automatically
- TEMPLATES-SAMPLES - Table Read From columns 2 and 3
- TEMPLATES-SAMPLES - Table Without Borders Auto Detection
- TEMPLATES-SAMPLES - Table from census table life and annuity quote request pdf
- TEMPLATES-SAMPLES - Table with Multiple Subitems
- TEMPLATES-SAMPLES - Text Extraction from Foldable Brochure Booklet
- TEMPLATES-SAMPLES - Ticket Airline
- VB.NET - Blood Test Results to JSON
- VB.NET - Census table from life and annuity quote request pdf
- VB.NET - Create Custom Template
- VB.NET - Extract line items from tables on multiple pages
- VB.NET - Parse From Url
- VB.NET - Parse Multipage Table
- VB.NET - Parse Simple Document
- VB.NET - Parse Uploaded File
- VB.NET - Parse Uploaded File Asynchronously
- VB.NET - Parse with OCR
- VB.NET - Parsing and reading data from Airline Tickets
- cURL - Document Parser Custom Template Code
- cURL - Document Parser Output as CSV
- cURL - Document Parser Output as JSON
- cURL - Document Parser Results
Copyright © 2016 - 2023 PDF.co