PDF Search Text

Search text in PDF and get coordinates. Supports regular expressions.

Available Methods

[POST] /pdf/find

[POST] /pdf/find

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url`.
httppassword optional HTTP auth password if required to access source `url`.
searchString required Text to search can support regular expressions if you set the `regexSearch` param to `true`.
pages optional Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at `0` (zero). To set a range use the dash `-`, for example: `0,2-5,7-`. To set a range from index to the last page use range like this: `2-` (from page #3 as the index starts at zero and till the of the document) for ALL pages just leave this param empty. Example: `0,2-5,7-` means the first page, then the 3rd page to the 6th page, and then the range from the 8th (index = `7`) page till the end of the document, The input must be in string format.
inline optional Must be one of: `true`, or `false`.
wordMatchingMode optional Values can be either ‘SmartMatch’, ‘ExactMatch’, or ‘None’.
password optional Password of the PDF file, The input must be in string format.
regexSearch optional Must be one of: `true`, or `false`.
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish. IMPORTANT: Also set the `inline` param to `true` to get a direct link to the final output pdf in both sync and async modes. Otherwise, you will be getting a direct link to pdf in sync mode but also a link to the `.json` file in the async mode.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Explore PDF.co knowledgebase for profile examples, The input must be in string format.

Method: POST
URL: /v1/pdf/find

Query parameters

No query parameters accepted.

Body payload

{
    "async": "false",
    "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+/\\d+/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
}

Example responses

/pdf/find

{
    "body": [
        {
            "text": "Invoice Date 01/01/2016",
            "left": 436.5400085449219,
            "top": 130.4599995137751,
            "width": 122.85311957550027,
            "height": 11.040000486224898,
            "pageIndex": 0,
            "bounds": {
                "location": {
                    "isEmpty": false,
                    "x": 436.54,
                    "y": 130.46
                },
                "size": "122.853119, 11.0400009",
                "x": 436.54,
                "y": 130.46,
                "width": 122.853119,
                "height": 11.0400009,
                "left": 436.54,
                "top": 130.46,
                "right": 559.3931,
                "bottom": 141.5,
                "isEmpty": false
            },
            "elementCount": 1,
            "elements": [
                {
                    "index": 0,
                    "left": 436.5400085449219,
                    "top": 130.4599995137751,
                    "width": 122.85311957550027,
                    "height": 11.040000486224898,
                    "angle": 0,
                    "text": "Invoice Date 01/01/2016",
                    "isNewLine": true,
                    "fontIsBold": true,
                    "fontIsItalic": false,
                    "fontName": "Helvetica-Bold",
                    "fontSize": 11,
                    "fontColor": "0, 0, 0",
                    "fontColorAsOleColor": 0,
                    "fontColorAsHtmlColor": "#000000",
                    "bounds": {
                        "location": {
                            "isEmpty": false,
                            "x": 436.54,
                            "y": 130.46
                        },
                        "size": "122.853119, 11.0400009",
                        "x": 436.54,
                        "y": 130.46,
                        "width": 122.853119,
                        "height": 11.0400009,
                        "left": 436.54,
                        "top": 130.46,
                        "right": 559.3931,
                        "bottom": 141.5,
                        "isEmpty": false
                    }
                }
            ]
        }
    ],
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "output",
    "remainingCredits": 59970
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/find' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "async": "false",
    "url": "pdfco-test-files.s3.us-west-2.amazonaws.compdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+/\\d+/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
}'

PDF Search Text

Available Methods

[POST] /pdf/find

Query parameters

Body payload

Example responses

/pdf/find

Code Snippet

CURL

Samples