PDF Find Table
Related Knowledgebase-Explore Samples
AI powered document analysis can scan your document for tables
and return the array of tables on pages with coordinates
and information about columns
detected in these tables.
Available Methods
[POST] /pdf/find/table (AI powered)
This function finds tables in documents using an AI-powered table detection engine.
This endpoint locates tables in input PDF document and returns JSON with:
- array of
tables
objects; X
,Y
,Width
,Height
coordinates for every table found;rect
param for every table that you can re-use withpdf/convert/to/json
,pdf/convert/to/csv
,pdf/convert/to/csv
and other endpoints to extract a selected table only;PageIndex
page index for a page with table. The very first page is0
(zero);Columns
array with the set ofX
coordinates for every column inside table that was found;
To extract table into CSV, JSON or XML please use pdf/convert/to/csv
, pdf/convert/to/json2
, pdf/convert/to/xml
endpoints with rect
parameter value from rect
output param for this table accordingly.
Input Parameters:
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.pages
optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at0
(zero). To set a range use the dash-
, for example:0,2-5,7-
. To set a range from index to the last page use range like this:2-
(from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example:0,2-5,7-
means first page, then 3rd page to 6th page, and then the range from 8th (index =7
) page till the end of the document. Must be a String.inline
optional. Must be one of:true
,false
. Whenfalse
, endpoint returns link to.json
file with the output.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. Returns UseJobId
that you may use with/job/check
to check state of the processing (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.encrypt
(legacy, now all files are stored at the encrypted cloud storage by default.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.profiles
optional. Must be a String. Use this param to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/find/table
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
"async": "false",
"encrypt": "false",
"inline": "true",
"password": ""
}
Example responses
/pdf/find/table
{
"body": {
"tables": [
{
"PageIndex": 0,
"X": 36,
"Y": 34.4400024,
"Width": 523.44,
"Height": 160.82,
"Columns": [
357.675
],
"rect": "36, 34.4400024, 523.44, 160.82"
},
{
"PageIndex": 0,
"X": 36,
"Y": 316.249969,
"Width": 523.44,
"Height": 120.620026,
"Columns": [
157.117,
340.68,
475.84
],
"rect": "36, 316.249969, 523.44, 120.620026"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.json",
"remainingCredits": 98892697,
"credits": 21
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
"async": "false",
"encrypt": "false",
"inline": "true",
"password": ""
}'
[POST] /pdf/find/table (legacy table finder)
This function finds tables in documents using AI-powered table detection engine but you can enable legacy table finder mode as well via profiles
parameter (see below).
This endpoint locates tables in input PDF document and returns JSON with:
- array of
tables
objects; X
,Y
,Width
,Height
coordinates for every table found;rect
param for every table that you can re-use withpdf/convert/to/json
,pdf/convert/to/csv
,pdf/convert/to/csv
and other endpoints to extract a selected table only;PageIndex
page index for a page with table. The very first page is0
(zero);Columns
array with the set ofX
coordinates for every column inside table that was found;
To extract table into CSV, JSON or XML please use pdf/convert/to/csv
, pdf/convert/to/json2
, pdf/convert/to/xml
endpoints with rect
parameter value from rect
output param for this table accordingly.
Input Parameters:
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.pages
optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at0
(zero). To set a range use the dash-
, for example:0,2-5,7-
. To set a range from index to the last page use range like this:2-
(from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example:0,2-5,7-
means first page, then 3rd page to 6th page, and then the range from 8th (index =7
) page till the end of the document. Must be a String.inline
optional. Must be one of:true
,false
. Whenfalse
, endpoint returns link to.json
file with the output.password
optional. Password of PDF file. Must be a Stringasync
optional. Runs processing asynchronously. Returns UseJobId
that you may use with/job/check
to check state of the processing (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.encrypt
(legacy, now all files are stored at the encrypted cloud storage by default.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.profiles
optional. Must be a String. Use this param to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.
Legacy mode can be enabled like this:
"profiles": "{ 'Mode': 'Legacy'}"
or with more detailed config telling min required rows, min columns and column detection mode.
"profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
- Method: POST
- URL: /v1/pdf/find/table
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
"async": "false",
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
}
Example responses
/pdf/find/table (legacy table finder)
{
"body": {
"tables": [
{
"PageIndex": 0,
"X": 30.72,
"Y": 309.36,
"Width": 533.76,
"Height": 134.16,
"Columns": [
163.92,
297.36,
431.039978
],
"rect": "30.72, 309.36, 533.76, 134.16"
}
]
},
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.json",
"remainingCredits": 98892760,
"credits": 21
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
"async": "false",
"encrypt": "false",
"inline": "true",
"password": "",
"profiles": "{ '\''Mode'\'': '\''Legacy'\'', '\''ColumnDetectionMode'\'': '\''BorderedTables'\'', '\''DetectionMinNumberOfRows'\'': 1, '\''DetectionMinNumberOfColumns'\'': 1, '\''DetectionMaxNumberOfInvalidSubsequentRowsAllowed'\'': 0, '\''DetectionMinNumberOfLineBreaksBetweenTables'\'': 0, '\''EnhanceTableBorders'\'': false }"
}'
Knowledgebase
Samples
- C# - PDF Table Search from Uploaded File
- C# - PDF Table Search from Uploaded File Asynchronously
- C# - PDF Table Search from URL
- C# - PDF Table Search from URL Asynchronously
- cURL - Search Tables From PDF
- Java - PDF Table Search from Uploaded File
- Java - PDF Table Search from Uploaded File Asynchronously
- Java - PDF Table Search from URL
- Java - PDF Table Search from URL Asynchronously
- JavaScript - PDF Get Search Table JSON (Node js)
- JavaScript - PDF Table Search from Uploaded File (Node js)
- JavaScript - PDF Table Search from Uploaded File (Node js) - Async API
- JavaScript - PDF Table Search from URL (Node js)
- JavaScript - PDF Table Search from URL (Node js) - Async API
- PowerShell - PDF Table Search from Uploaded File
- PowerShell - PDF Table Search from Uploaded File Asynchronously
- PowerShell - PDF Table Search from URL
- PowerShell - PDF Table Search from URL Asynchronously
- Python - PDF Table Search from Uploaded File
- Python - PDF Table Search from Uploaded File Asynchronously
- VB.NET - PDF Table Search from Uploaded File
- VB.NET - PDF Table Search from Uploaded File Asynchronously
- VB.NET - PDF Table Search from URL
- VB.NET - PDF Table Search from URL Asynchronously
Copyright © 2016 - 2022 PDF.co