PDF Split By Text Search
Split PDF into multiple PDF files by text search (support regular expressions).
Available Methods
[POST] /pdf/split2 (split by text search)
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage.
For uploading files via API please check Files Upload section.
If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching.
You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
optional - http auth user name if required to access sourceurl
.httppassword
optional - http auth password if required to access sourceurl
.searchString
(required). Text to search for on pages. Must be a String.excludeKeyPages
optional. Set totrue
if you want to exclude pages where text was found.false
by default.regexSearch
optional. Set totrue
to enable regular expressions for search string.false
by default.caseSensitive
optional. Set totrue
to enable case sensitive search.false
by default.lang
optional. Sets language for OCR (text from image) to use for scanned PDF, PNG, JPG documents input when extracting text. Default is “eng”. Other languages are also supported:deu
,spa
,chi_sim
,jpn
and many others (full list of supported OCR languages is here. You can also use 2 languages simultaneously like this:eng+deu
orjpn+kor
(any combination).async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.inline
optional.false
by default. Inasync
mode makes to returnbody
with the content of the output json (with the links to the output).name
optional. name of the output file.expiration
optional. Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.profiles
optional. Must be a String. Use this parameter to set additional configuration for fine tuning and extra options. Explore PDF.co knowledgebase for profile examples.- Method: POST
- URL: /v1/pdf/split2
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf",
"searchString": "invoice number",
"excludeKeyPages": false,
"regexSearch": false,
"caseSensitive": false,
"inline": true,
"name": "invoice-extracted",
"async": false
}
Example responses
/pdf/split2
{
"urls": [
"https://pdf-temp-files.s3.amazonaws.com/1e9a7f2c46834160903276716424382b/invoice-extracted_page1.pdf",
"https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page2.pdf",
"https://pdf-temp-files.s3.amazonaws.com/c976b9f89a2e460786a3d5c0deeeef67/invoice-extracted_page3.pdf"
],
"pageCount": 3,
"error": false,
"status": 200,
"name": "invoice-extracted.pdf",
"remainingCredits": 98441
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/split2' \
--header 'Content-Type: application/json' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-split/multiple-invoices.pdf",
"searchString": "invoice number",
"excludeKeyPages": false,
"regexSearch": false,
"caseSensitive": false,
"inline": true,
"name": "invoice-extracted",
"async": false
}'
Copyright © 2016 - 2023 PDF.co