Link Search Menu Expand Document

PDF Find Table

AI powered document analysis can scan your document for tables and return the array of tables on pages with coordinates and information about columns detected in these tables.

Available Methods

Go To Samples

[POST] /pdf/find/table (AI powered)

This function finds tables in documents using AI-powered table detection engine.

This endpoint locates tables in input PDF document and returns JSON with:

  • array of tables objects;
  • X, Y, Width, Height coordinates for every table found;
  • rect param for every table that you can re-use with pdf/convert/to/json, pdf/convert/to/csv, pdf/convert/to/csv and other endpoints to extract a selected table only;
  • PageIndex page index for a page with table. The very first page is 0 (zero);
  • Columns array with the set of X coordinates for every column inside table that was found;

To extract table into CSV, JSON or XML please use pdf/convert/to/csv, pdf/convert/to/json2, pdf/convert/to/xml endpoints with rect parameter value from rect output param for this table accordingly.

Input Parameters:

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching.
  • httpusername (optinal) - http auth user name if required to access source url.
  • httppassword (optinal) - http auth password if required to access source url.
  • pages optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash -, for example: 0,2-5,7-. To set a range from index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7) page till the end of the document. Must be a String.
  • inline optional. Must be one of: true, false. When false, endpoint returns link to .json file with the output.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns Use JobId that you may use with /job/check to check state of the processing (possible states: working, failed, aborted and success). Must be one of: true, false.
  • encrypt optional. Enable encryption for output file. Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • profiles optional. Must be a String. You can set additional and extra options using this parameter that allows you to set custom configuration. See profiles samples for examples.

  • Method: POST
  • URL: /v1/pdf/find/table

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "async": "false",
    "encrypt": "false",
    "inline": "true",
    "password": ""
}

Example responses

/pdf/find/table
{
    "body": {
        "tables": [
            {
                "PageIndex": 0,
                "X": 36,
                "Y": 34.4400024,
                "Width": 523.44,
                "Height": 160.82,
                "Columns": [
                    357.675
                ],
                "rect": "36, 34.4400024, 523.44, 160.82"
            },
            {
                "PageIndex": 0,
                "X": 36,
                "Y": 316.249969,
                "Width": 523.44,
                "Height": 120.620026,
                "Columns": [
                    157.117,
                    340.68,
                    475.84
                ],
                "rect": "36, 316.249969, 523.44, 120.620026"
            }
        ]
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample.json",
    "remainingCredits": 98892697,
    "credits": 21
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "async": "false",
    "encrypt": "false",
    "inline": "true",
    "password": ""
}'

[POST] /pdf/find/table (legacy table finder)

This function finds tables in documents using AI-powered table detection engine but you can enable legacy table finder mode as well via profiles parameter (see below).

This endpoint locates tables in input PDF document and returns JSON with:

  • array of tables objects;
  • X, Y, Width, Height coordinates for every table found;
  • rect param for every table that you can re-use with pdf/convert/to/json, pdf/convert/to/csv, pdf/convert/to/csv and other endpoints to extract a selected table only;
  • PageIndex page index for a page with table. The very first page is 0 (zero);
  • Columns array with the set of X coordinates for every column inside table that was found;

To extract table into CSV, JSON or XML please use pdf/convert/to/csv, pdf/convert/to/json2, pdf/convert/to/xml endpoints with rect parameter value from rect output param for this table accordingly.

Input Parameters:

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching.
  • httpusername (optinal) - http auth user name if required to access source url.
  • httppassword (optinal) - http auth password if required to access source url.
  • pages optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash -, for example: 0,2-5,7-. To set a range from index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7) page till the end of the document. Must be a String.
  • inline optional. Must be one of: true, false. When false, endpoint returns link to .json file with the output.
  • password optional. Password of PDF file. Must be a String
  • async optional. Runs processing asynchronously. Returns Use JobId that you may use with /job/check to check state of the processing (possible states: working, failed, aborted and success). Must be one of: true, false.
  • encrypt optional. Enable encryption for output file. Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • profiles optional. Must be a String. You can set additional and extra options using this parameter that allows you to set custom configuration. See profiles samples for examples.

Legacy mode can be enabled like this:

"profiles": "{ 'Mode': 'Legacy'}"

or with more detailed config telling min required rows, min columns and column detection mode.

"profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
  • Method: POST
  • URL: /v1/pdf/find/table

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "async": "false",
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": "{ 'Mode': 'Legacy', 'ColumnDetectionMode': 'BorderedTables', 'DetectionMinNumberOfRows': 1, 'DetectionMinNumberOfColumns': 1, 'DetectionMaxNumberOfInvalidSubsequentRowsAllowed': 0, 'DetectionMinNumberOfLineBreaksBetweenTables': 0, 'EnhanceTableBorders': false }"
}

Example responses

/pdf/find/table (legacy table finder)
{
    "body": {
        "tables": [
            {
                "PageIndex": 0,
                "X": 30.72,
                "Y": 309.36,
                "Width": 533.76,
                "Height": 134.16,
                "Columns": [
                    163.92,
                    297.36,
                    431.039978
                ],
                "rect": "30.72, 309.36, 533.76, 134.16"
            }
        ]
    },
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample.json",
    "remainingCredits": 98892760,
    "credits": 21
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find/table' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "async": "false",
    "encrypt": "false",
    "inline": "true",
    "password": "",
    "profiles": "{ '\''Mode'\'': '\''Legacy'\'', '\''ColumnDetectionMode'\'': '\''BorderedTables'\'', '\''DetectionMinNumberOfRows'\'': 1, '\''DetectionMinNumberOfColumns'\'': 1, '\''DetectionMaxNumberOfInvalidSubsequentRowsAllowed'\'': 0, '\''DetectionMinNumberOfLineBreaksBetweenTables'\'': 0, '\''EnhanceTableBorders'\'': false }"
}'

Samples