Link Search Menu Expand Document

PDF Split By Text Search

Split PDF into multiple PDF files by text search (support regular expressions).

Available Methods

[POST] /pdf/split2 (split by text search)

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strong AES encryption with your own keys). Click here to learn more.
  • httpusername (optional) - http auth user name if required to access source url.
  • httppassword (optional) - http auth password if required to access source url.
  • searchString (required). Text to search for on pages. Must be a String.
  • excludeKeyPages (optional). Set to true if you want to exclude pages where text was found. false by default.
  • regexSearch (optional). Set to true to enable regular expressions for search string. false by default.
  • caseSensitive (optional). Set to true to enable case sensitive search. false by default.
  • lang optional. Sets language for OCR (text from image) to use for scanned PDF, PNG, JPG documents input when extracting text. Default is “eng”. Other languages are also supported: deu, spa, chi_sim, jpn and many others (full list of supported OCR languages is here. You can also use 2 languages simultaneously like this: eng+deu or jpn+kor (any combination).
  • async optional. Runs processing asynchronously. Returns JobId that you may use with /job/check to check state of the background job (possible states: working, failed, aborted and success). Must be one of: true, false.
  • inline optional. false by default. In async mode makes to return body with the content of the output json (with the links to the output).
  • name optional. name of the output file.
  • expiration (optional). Output link expiration in minutes. Default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
  • profiles optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.

  • Method: POST
  • URL: /v1/pdf/split2

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf",
    "searchString": "invoice number",
    "excludeKeyPages": false,
    "regexSearch": false,
    "caseSensitive": false,
    "inline": true,
    "name": "invoice-extracted",
    "async": false
}

Example responses

/pdf/split2
{
    "urls": [
        "https://pdf-temp-files.s3.amazonaws.com/1e9a7f2c46834160903276716424382b/invoice-extracted_page1.pdf",
        "https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page2.pdf",
        "https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page3.pdf"
    ],
    "pageCount": 3,
    "error": false,
    "status": 200,
    "name": "invoice-extracted.pdf",
    "remainingCredits": 98441
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/split2' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf",
    "searchString": "invoice number",
    "excludeKeyPages": false,
    "regexSearch": false,
    "caseSensitive": false,
    "inline": true,
    "name": "invoice-extracted",
    "async": false
}'