Link Search Menu Expand Document

PDF Search Text

Search text in PDF and get coordinates. Supports regular expressions.

Available methods

[POST] /pdf/find

  • url required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly getting Too Many Requests or Access Denied error for your input url, please try to add cache: to enable built-in url caching.
  • searchString text to search. Can contain regular expressions if you set regexSearch param to true.
  • pages optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash -, for example: 0,2-5,7-. To set a range from index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7) page till the end of the document. Must be a String.
  • inline optional. Must be one of: true, false.
  • wordMatchingMode optional. Must be a String.
  • password optional. Password of PDF file. Must be a String
  • regexSearch optional. Must be one of: true, false.
  • async optional. Runs processing asynchronously. Returns Use JobId that you may use with /job/check to check state of the processing (possible states: working, failed, aborted and success). Must be one of: true, false.
  • encrypt optional. Enable encryption for output file. Must be one of: true, false.
  • name optional. File name for generated output. Must be a String.
  • profiles optional. Must be a String. You can set additional and extra options using this parameter that allows you to set custom configuration. See profiles samples for examples.

Description

  • Method: POST
  • URL: /v1/pdf/find

Query parameters

No query parameters accepted.

Body payload

{
    "async": "false",
    "encrypt": "false",
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+/\\d+/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
}

Example responses

/pdf/find
{
    "body": [
        {
            "text": "Invoice Date 01/01/2016",
            "left": 436.5400085449219,
            "top": 130.4599995137751,
            "width": 122.85311957550027,
            "height": 11.040000486224898,
            "pageIndex": 0,
            "bounds": {
                "location": {
                    "isEmpty": false,
                    "x": 436.54,
                    "y": 130.46
                },
                "size": "122.853119, 11.0400009",
                "x": 436.54,
                "y": 130.46,
                "width": 122.853119,
                "height": 11.0400009,
                "left": 436.54,
                "top": 130.46,
                "right": 559.3931,
                "bottom": 141.5,
                "isEmpty": false
            },
            "elementCount": 1,
            "elements": [
                {
                    "index": 0,
                    "left": 436.5400085449219,
                    "top": 130.4599995137751,
                    "width": 122.85311957550027,
                    "height": 11.040000486224898,
                    "angle": 0,
                    "text": "Invoice Date 01/01/2016",
                    "isNewLine": true,
                    "fontIsBold": true,
                    "fontIsItalic": false,
                    "fontName": "Helvetica-Bold",
                    "fontSize": 11,
                    "fontColor": "0, 0, 0",
                    "fontColorAsOleColor": 0,
                    "fontColorAsHtmlColor": "#000000",
                    "bounds": {
                        "location": {
                            "isEmpty": false,
                            "x": 436.54,
                            "y": 130.46
                        },
                        "size": "122.853119, 11.0400009",
                        "x": 436.54,
                        "y": 130.46,
                        "width": 122.853119,
                        "height": 11.0400009,
                        "left": 436.54,
                        "top": 130.46,
                        "right": 559.3931,
                        "bottom": 141.5,
                        "isEmpty": false
                    }
                }
            ]
        }
    ],
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "output",
    "remainingCredits": 59970
}

Code Snippets

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/find' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "async": "false",
    "encrypt": "false",
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\d+/\\d+/\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
}'
JavaScript
var myHeaders = new Headers();
myHeaders.append("x-api-key", "");
myHeaders.append("Content-Type", "application/json");

var raw = JSON.stringify({
 "async": "false",
 "encrypt": "false",
 "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
 "searchString": "Invoice Date \\d+/\\d+/\\d+",
 "regexSearch": "true",
 "name": "output",
 "pages": "0-",
 "inline": "true",
 "wordMatchingMode": "",
 "password": ""
});

var requestOptions = {
	method: 'POST',
	headers: myHeaders,
	body: raw,
	redirect: 'follow'
};

fetch("https://api.pdf.co/v1/pdf/find", requestOptions)
	.then(response => response.text())
	.then(result => console.log(result))
	.catch(error => console.log('error', error));
NodeJs
var request = require('request');
var options = {
	'method': 'POST',
	'url': 'https://api.pdf.co/v1/pdf/find',
	'headers': {
		'x-api-key': '',
		'Content-Type': 'application/json'
	},
	body: JSON.stringify({
	 "async": "false",
	 "encrypt": "false",
	 "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
	 "searchString": "Invoice Date \\d+/\\d+/\\d+",
	 "regexSearch": "true",
	 "name": "output",
	 "pages": "0-",
	 "inline": "true",
	 "wordMatchingMode": "",
	 "password": ""
	})

};
request(options, function (error, response) {
	if (error) throw new Error(error);
	console.log(response.body);
});

PHP
<?php

$curl = curl_init();

curl_setopt_array($curl, array(
	CURLOPT_URL => 'https://api.pdf.co/v1/pdf/find',
	CURLOPT_RETURNTRANSFER => true,
	CURLOPT_ENCODING => '',
	CURLOPT_MAXREDIRS => 10,
	CURLOPT_TIMEOUT => 0,
	CURLOPT_FOLLOWLOCATION => true,
	CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
	CURLOPT_CUSTOMREQUEST => 'POST',
	CURLOPT_POSTFIELDS =>'{
    "async": "false",
    "encrypt": "false",
    "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
    "searchString": "Invoice Date \\\\d+/\\\\d+/\\\\d+",
    "regexSearch": "true",
    "name": "output",
    "pages": "0-",
    "inline": "true",
    "wordMatchingMode": "",
    "password": ""
}',
	CURLOPT_HTTPHEADER => array(
		'x-api-key: ',
		'Content-Type: application/json'
	),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

Java
import java.io.*;
import okhttp3.*;
public class main {
	public static void main(String []args) throws IOException{
		OkHttpClient client = new OkHttpClient().newBuilder()
			.build();
		MediaType mediaType = MediaType.parse("application/json");
		RequestBody body = RequestBody.create(mediaType, "{\n    \"async\": \"false\",\n    \"encrypt\": \"false\",\n    \"url\": \"https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf\",\n    \"searchString\": \"Invoice Date \\\\d+/\\\\d+/\\\\d+\",\n    \"regexSearch\": \"true\",\n    \"name\": \"output\",\n    \"pages\": \"0-\",\n    \"inline\": \"true\",\n    \"wordMatchingMode\": \"\",\n    \"password\": \"\"\n}");
		Request request = new Request.Builder()
			.url("https://api.pdf.co/v1/pdf/find")
			.method("POST", body)
			.addHeader("x-api-key", "")
			.addHeader("Content-Type", "application/json")
			.build();
		Response response = client.newCall(request).execute();
		System.out.println(response.body().string());
	}
}

C#
using System;
using RestSharp;
namespace HelloWorldApplication {
	class HelloWorld {
		static void Main(string[] args) {
			var client = new RestClient("https://api.pdf.co/v1/pdf/find");
			client.Timeout = -1;
			var request = new RestRequest(Method.POST);
			request.AddHeader("x-api-key", "");
			request.AddHeader("Content-Type", "application/json");
			var body = @"{" + "\n" +
			@"    ""async"": ""false""," + "\n" +
			@"    ""encrypt"": ""false""," + "\n" +
			@"    ""url"": ""https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf""," + "\n" +
			@"    ""searchString"": ""Invoice Date \\d+/\\d+/\\d+""," + "\n" +
			@"    ""regexSearch"": ""true""," + "\n" +
			@"    ""name"": ""output""," + "\n" +
			@"    ""pages"": ""0-""," + "\n" +
			@"    ""inline"": ""true""," + "\n" +
			@"    ""wordMatchingMode"": """"," + "\n" +
			@"    ""password"": """"" + "\n" +
			@"}";
			request.AddParameter("application/json", body,  ParameterType.RequestBody);
			IRestResponse response = client.Execute(request);
			Console.WriteLine(response.Content);
		}
	}
}

Python
import requests
import json

url = "https://api.pdf.co/v1/pdf/find"

payload = json.dumps({
 "async": "false",
 "encrypt": "false",
 "url": "https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
 "searchString": "Invoice Date \\d+/\\d+/\\d+",
 "regexSearch": "true",
 "name": "output",
 "pages": "0-",
 "inline": "true",
 "wordMatchingMode": "",
 "password": ""
})
headers = {
	'x-api-key': '',
	'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Powershell
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("x-api-key", "")
$headers.Add("Content-Type", "application/json")

$body = "{`n    `"async`": `"false`",`n    `"encrypt`": `"false`",`n    `"url`": `"https://bytescout-com.s3.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf`",`n    `"searchString`": `"Invoice Date `\`\d+/`\`\d+/`\`\d+`",`n    `"regexSearch`": `"true`",`n    `"name`": `"output`",`n    `"pages`": `"0-`",`n    `"inline`": `"true`",`n    `"wordMatchingMode`": `"`",`n    `"password`": `"`"`n}"

$response = Invoke-RestMethod 'https://api.pdf.co/v1/pdf/find' -Method 'POST' -Headers $headers -Body $body
$response | ConvertTo-Json