PDF Find Table
AI powered document analysis can scan your document for tables
and return the array of tables on pages with coordinates
and information about columns
detected in these tables.
Available Methods
[POST] /pdf/find/table (AI powered)
This function finds tables in documents using an AI-powered table detection engine.
This endpoint locates tables in an input PDF document and returns JSON with:
- The array of
tables
objects; X
,Y
,Width
, andHeight
coordinates for every table found;Rect
param for every table that you can re-use withpdf/convert/to/json
,pdf/convert/to/csv
,pdf/convert/to/csv
, and other endpoints to extract a selected table only;PageIndex
page index for a page with a table. The very first page is0
(zero);Columns
array with the set ofX
coordinates for every column inside the table that was found;
To extract the table into CSV, JSON, or XML please use pdf/convert/to/csv
, pdf/convert/to/json2
, and pdf/convert/to/xml
endpoints with rect
parameter value from rect
output param for this table accordingly.
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url . |
httppassword optional HTTP auth password if required to access source url . |
pages optional Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash - , for example: 0,2-5,7- . To set a range from the index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty.Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7 ) page till the end of the document, The input must be in string format. |
inline optional Must be one of: true , or false . When false , the endpoint returns a link to the .json file with the output. |
password optional Password of PDF file, The input must be in string format. |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for the generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
- Method: POST
- URL: /v1/pdf/find/table
Query parameters
No query parameters accepted.
Body payload
{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": ""
}
Example responses
/pdf/find/table
{
"body": {
"tables": [
{
"PageIndex": 0,
"X": 36,
"Y": 34.4400024,
"Width": 523.44,
"Height": 160.82,
"Columns": [
357.675
],
"rect": "36, 34.4400024, 523.44, 160.82"
},
{
"PageIndex": 0,
"X": 36,
"Y": 316.249969,
"Width": 523.44,
"Height": 120.620026,
"Columns": [
157.117,
340.68,
475.84
],
"rect": "36, 316.249969, 523.44, 120.620026"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.json",
"remainingCredits": 98892697,
"credits": 21
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": ""
}'
[POST] /pdf/find/table (legacy table finder)
This function finds tables in documents using an AI-powered table detection engine but you can enable legacy table finder mode as well via the profiles
parameter (see below).
This endpoint locates tables in the input PDF document and returns JSON with:
- an array of
tables
objects; X
,Y
,Width
, andHeight
coordinates for every table found;rect
param for every table that you can re-use withpdf/convert/to/json
,pdf/convert/to/csv
,pdf/convert/to/csv
, and other endpoints to extract a selected table only;PageIndex
page index for a page with a table. The very first page is0
(zero);Columns
array with the set ofX
coordinates for every column inside a table that was found;
To extract the table into CSV, JSON, or XML please use the pdf/convert/to/csv
, pdf/convert/to/json2
, and pdf/convert/to/xml
endpoints with the rect
parameter value from rect
output param for this table accordingly.
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent Too Many Requests or Access Denied errors, please try to add cache: to enable built-in URL caching. (e.g cache:https://example.com/file1.pdf ) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption. |
httpusername optional HTTP auth user name if required to access source url . |
httppassword optional HTTP auth password if required to access source url . |
pages optional Comma-separated list of page indices (or ranges) to process. IMPORTANT: The very first page starts at 0 (zero). To set a range use the dash - , for example: 0,2-5,7- . To set a range from the index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty.Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7 ) page till the end of the document, The input must be in string format. |
inline optional Must be one of: true , or false . When false , the endpoint returns a link to the .json file with the output. |
password optional Password of PDF file, The input must be in string format. |
async optional Set async to true for long processes to run in the background, API will then return a jobId which you can use with /job/check endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. |
name optional File name for the generated output, The input must be in string format. |
expiration optional Set the expiration time for the output link in minutes ( default is 60 i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format. |
Legacy mode can be enabled like this:
"profiles": "{ 'Mode': 'Legacy'}"
or with a more detailed config telling min required rows, min columns, and column detection mode.
"profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
- Method: POST
- URL: /v1/pdf/find/table
Query parameters
No query parameters accepted.
Body payload
{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": "",
"profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
}
Example responses
/pdf/find/table (legacy table finder)
{
"body": {
"tables": [
{
"PageIndex": 0,
"X": 30.72,
"Y": 309.36,
"Width": 533.76,
"Height": 134.16,
"Columns": [
163.92,
297.36,
431.039978
],
"rect": "30.72, 309.36, 533.76, 134.16"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.json",
"remainingCredits": 98892760,
"credits": 21
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
"async": "false",
"inline": "true",
"password": "",
"profiles": "{ '\''Mode'\'': '\''Legacy'\'', '\''ColumnDetectionMode'\'': '\''BorderedTables'\'', '\''DetectionMinNumberOfRows'\'': 1, '\''DetectionMinNumberOfColumns'\'': 1, '\''DetectionMaxNumberOfInvalidSubsequentRowsAllowed'\'': 0, '\''DetectionMinNumberOfLineBreaksBetweenTables'\'': 0, '\''EnhanceTableBorders'\'': false }"
}'
Samples
- C# - PDF Table Search from URL
- C# - PDF Table Search from URL Asynchronously
- C# - PDF Table Search from Uploaded File
- C# - PDF Table Search from Uploaded File Asynchronously
- GoogleAppsScript - ExtractTablesWithText
- Java - PDF Table Search from URL
- Java - PDF Table Search from URL Asynchronously
- Java - PDF Table Search from Uploaded File
- Java - PDF Table Search from Uploaded File Asynchronously
- JavaScript - PDF Get Search Table JSON (Node js)
- JavaScript - PDF Table Search from URL (Node js)
- JavaScript - PDF Table Search from URL (Node js) - Async API
- JavaScript - PDF Table Search from Uploaded File (Node js)
- JavaScript - PDF Table Search from Uploaded File (Node js) - Async API
- PowerShell - PDF Table Search from URL
- PowerShell - PDF Table Search from URL Asynchronously
- PowerShell - PDF Table Search from Uploaded File
- PowerShell - PDF Table Search from Uploaded File Asynchronously
- Python - PDF Get Search Table Data
- Python - PDF Table Search from Uploaded File
- Python - PDF Table Search from Uploaded File Asynchronously
- Salesforce - Search Table From URL
- VB.NET - PDF Table Search from URL
- VB.NET - PDF Table Search from URL Asynchronously
- VB.NET - PDF Table Search from Uploaded File
- VB.NET - PDF Table Search from Uploaded File Asynchronously
- cURL - Search Tables From PDF
Copyright © 2016 - 2024 PDF.co