Link Search Menu Expand Document

PDF To XML

Explore Samples

Convert PDF to XML with information about text value, tables, fonts, images, objects positions.

Available Methods

[POST] /pdf/convert/to/xml

Auto classification Of Incoming Documents

Use /pdf/classifier (Document Classifier) endpoint to automatically sort/detect the class of the document based on keywords-based rules. For example, you can define rules to find which vendor provided the document to find which template to apply accordingly.

Attributes
url required
URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage.

For uploading files via API please check Files Upload section.

If you are randomly getting a Too Many Requests or Access Denied error for your input URL, Please try to add cache: to enable built-in URL caching.

You can also encrypt data for output files and decrypt data input files with user-controlled data encryption to learn more.
httpusername optional
HTTP auth user name if required to access source url.
httppassword optional
HTTP auth password if required to access source url.
pages optional
Comma-separated list of page indices (or ranges) to process.
IMPORTANT: the very first page starts at 0 (zero). To set a range use the dash -, for example: 0,2-5,7-. To set a range from the index to the last page use range like this: 2- (from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty.
Example: 0,2-5,7- means first page, then 3rd page to 6th page, and then the range from 8th (index = 7) page till the end of the document, The input must be in string format.
unwrap optional
Unwrap lines into a single line within table cells when lineGrouping is enabled. Must be one of: true, or false.
rect optional
Defines coordinates for extraction, e.g. 51.8, 114.8, 235.5, 204.0. Use PDF.co PDF Edit Add Helper to get or measure pdf coordinates. The input must be in string format.
lang optional
Set the language for OCR (text from image) to use for scanned PDF, PNG, and JPG documents input when extracting text.
The default is “eng”. Other languages are also supported: deu, spa, chi_sim, jpn, and many others (full list of supported OCR languages is here.
You can also use 2 languages simultaneously like this: eng+deu or jpn+kor (any combination).
inline optional
Must be one of: true to return data as inline or false to return link to the output file (default).
lineGrouping optional
Line grouping within table cells. Set to 1 to enable the grouping, The input must be in string format.
async optional
Runs processing asynchronously and returns JobId that you may use with /job/check to check the state of the processing (possible states: working, failed, aborted and success). Must be one of: true, false.
name optional
File name for the generated output.
expiration optional
Output link expiration in minutes. The default is 60 (i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage.
Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.
profiles optional
This parameter can be used to set additional configurations for fine-tuning and to enable more options. Visit PDF.co knowledgebase for profile examples and more. Make sure to provide the input in string format. For instance, to alter the CSV separator, you can use: { 'CSVSeparatorSymbol': ';' }.
Tip: Utilize the OCR Analyzer of PDF Multitool to generate and examine OCR configuration profiles. Learn More.
  • Method: POST
  • URL: /v1/pdf/convert/to/xml

Query parameters

No query parameters accepted.

Body payload

{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
    "async": false
}

Example responses

/pdf/convert/to/xml
{
    "body": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<document>\r\n  <page index=\"0\">\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"24.0\" fontStyle=\"Bold\" color=\"#538DD3\" x=\"36.00\" y=\"34.44\" width=\"242.81\" height=\"24.00\">Your Company Name</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"76.94\" width=\"66.62\" height=\"11.04\">Your Address</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"91.46\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"461.02\" y=\"115.94\" width=\"98.42\" height=\"11.04\">Invoice No. 123456</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"436.54\" y=\"130.46\" width=\"122.90\" height=\"11.04\">Invoice Date 01/01/2016</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"154.94\" width=\"63.62\" height=\"11.04\">Client Name</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"169.70\" width=\"40.34\" height=\"11.04\">Address</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"184.22\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"233.30\" width=\"28.70\" height=\"11.04\">Notes</text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"316.25\" width=\"22.58\" height=\"11.04\">Item</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"247.61\" y=\"316.25\" width=\"44.64\" height=\"11.04\">Quantity</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"398.95\" y=\"316.25\" width=\"26.91\" height=\"11.04\">Price</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"533.14\" y=\"316.25\" width=\"26.30\" height=\"11.04\">Total</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"341.33\" width=\"30.62\" height=\"11.04\">Item 1</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"341.33\" width=\"6.12\" height=\"11.04\">1</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"341.33\" width=\"27.51\" height=\"11.04\">40.00</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"341.33\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"362.45\" width=\"30.62\" height=\"11.04\">Item 2</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"362.45\" width=\"6.12\" height=\"11.04\">2</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"362.45\" width=\"27.51\" height=\"11.04\">30.00</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"362.45\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"383.57\" width=\"30.62\" height=\"11.04\">Item 3</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"383.57\" width=\"6.12\" height=\"11.04\">3</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"383.57\" width=\"27.51\" height=\"11.04\">20.00</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"383.57\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"404.93\" width=\"30.62\" height=\"11.04\">Item 4</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"404.93\" width=\"6.12\" height=\"11.04\">4</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"404.93\" width=\"27.51\" height=\"11.04\">10.00</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"404.93\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n      </column>\r\n    </row>\r\n    <row>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text>\r\n        </text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"389.11\" y=\"425.83\" width=\"36.75\" height=\"11.04\">TOTAL</text>\r\n      </column>\r\n      <column>\r\n        <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"525.82\" y=\"425.83\" width=\"33.62\" height=\"11.04\">200.00</text>\r\n      </column>\r\n    </row>\r\n  </page>\r\n</document>",
    "pageCount": 1,
    "error": false,
    "status": 200,
    "name": "sample.xml",
    "remainingCredits": 60563
}

Code Snippet

CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: ' \
--data-raw '{
    "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-xml/sample.pdf",
    "async": false
}'

Samples