PDF Split By Text Search

Split PDF into multiple PDF files by text search (support regular expressions).

Available Methods

[POST] /pdf/split2 (split by text search)

[POST] /pdf/split2 (split by text search)

Attributes
url required URL to the source file. Supports links from Google Drive, Dropbox, and PDF.co built-in files storage. To upload files via API, Check out the Files Upload section. Note: If you experience intermittent `Too Many Requests` or `Access Denied` errors, please try to add `cache:` to enable built-in URL caching. (e.g `cache:https://example.com/file1.pdf`) For data security, you have the option to encrypt output files and decrypt input files. Learn more about user-controlled data encryption.
httpusername optional HTTP auth user name if required to access source `url`.
httppassword optional HTTP auth password if required to access source `url`.
searchString required Text to search for on pages. Must be a string.
excludeKeyPages optional, `false` by default Set to `true` if you want to exclude pages where text was found.
regexSearch optional, `false` by default Set to `true` to enable regular expressions for the search string.
caseSensitive optional, `false` by default Set to `true` to enable case-sensitive search.
lang optional Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is “eng”. Other languages are also supported: `deu`, `spa`, `chi_sim`, `jpn`, and many others (full list of supported OCR languages is here. You can also use 2 languages simultaneously like this: `eng+deu` or `jpn+kor` (any combination).
async optional Set `async` to `true` for long processes to run in the background, API will then return a `jobId` which you can use with `/job/check` endpoint to check the status of the process and retrieve the output while you can proceed with other tasks without waiting for this process to finish.
inline optional Must be one of: `true` to return data as inline or `false` to return a link to the output file (default).
name optional File name for the generated output. Must be a string.
expiration optional Set the expiration time for the output link in minutes (`default is 60` i.e 60 minutes or 1 hour), After this specified duration, any generated output file(s) will be automatically deleted from PDF.co temporary files storage. The maximum duration for link expiration varies based on your current subscription plan. Learn more To store permanent input files (e.g. re-usable images, pdf templates, documents), Consider using PDF.co built-in Files Storage.
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Must be a string. Explore PDF.co knowledgebase for profile examples.

Method: POST
URL: /v1/pdf/split2

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-split/multiple-invoices.pdf",
    "searchString": "invoice number",
    "excludeKeyPages": false,
    "regexSearch": false,
    "caseSensitive": false,
    "inline": true,
    "name": "invoice-extracted",
    "async": false
}

Example responses

/pdf/split2

{
    "urls": [
        "https://pdf-temp-files.s3.amazonaws.com/1e9a7f2c46834160903276716424382b/invoice-extracted_page1.pdf",
        "https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page2.pdf",
        "https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page3.pdf"
    ],
    "pageCount": 3,
    "error": false,
    "status": 200,
    "name": "invoice-extracted.pdf",
    "remainingCredits": 98441
}

Code Snippet

CURL

curl --location --request POST 'https://api.pdf.co/v1/pdf/split2' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-split/multiple-invoices.pdf",
    "searchString": "invoice number",
    "excludeKeyPages": false,
    "regexSearch": false,
    "caseSensitive": false,
    "inline": true,
    "name": "invoice-extracted",
    "async": false
}'