PDF Split By Text Search
Split PDF into multiple PDF files by text search (support regular expressions).
Available Methods
[POST] /pdf/split2 (split by text search)
Attributes |
---|
url required URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting a Too Many Requests or Access Denied error for your input URL, Please try to add cache: to enable built-in URL caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption to learn more. |
httpusername optional HTTP auth user name if required to access source url . |
httppassword optional HTTP auth password if required to access source url . |
searchString required Text to search for on pages. Must be a string. |
excludeKeyPages optional, false by default Set to true if you want to exclude pages where text was found. |
regexSearch optional, false by default Set to true to enable regular expressions for the search string. |
caseSensitive optional, false by default Set to true to enable case-sensitive search. |
lang optional Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text. The default is “eng”. Other languages are also supported: deu , spa , chi_sim , jpn , and many others (full list of supported OCR languages is here.You can also use 2 languages simultaneously like this: eng+deu or jpn+kor (any combination). |
async optional Runs processing asynchronously and returns JobId that you may use with /job/check to check the state of the processing (possible states: working , failed , aborted and success ). Must be one of: true , or false . |
inline optional Must be one of: true to return data as inline or false to return link to the output file (default). |
name optional File name for the generated output. Must be a string. |
expiration optional Output link expiration in minutes. The default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage.Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead. |
profiles optional Use this parameter to set additional configurations for fine-tuning and extra options. Must be a string. Explore PDF.co knowledgebase for profile examples. |
- Method: POST
- URL: /v1/pdf/split2
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-split/multiple-invoices.pdf",
"searchString": "invoice number",
"excludeKeyPages": false,
"regexSearch": false,
"caseSensitive": false,
"inline": true,
"name": "invoice-extracted",
"async": false
}
Example responses
/pdf/split2
{
"urls": [
"https://pdf-temp-files.s3.amazonaws.com/1e9a7f2c46834160903276716424382b/invoice-extracted_page1.pdf",
"https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page2.pdf",
"https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page3.pdf"
],
"pageCount": 3,
"error": false,
"status": 200,
"name": "invoice-extracted.pdf",
"remainingCredits": 98441
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/split2' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-split/multiple-invoices.pdf",
"searchString": "invoice number",
"excludeKeyPages": false,
"regexSearch": false,
"caseSensitive": false,
"inline": true,
"name": "invoice-extracted",
"async": false
}'
Copyright © 2016 - 2023 PDF.co