PDF To XML
Convert PDF to XML with information about text value, tables, fonts, images, objects positions.
Available Methods
[POST] /pdf/convert/to/xml
Auto classification Of Incoming Documents
Use /pdf/classifier
(Document Classifier) endpoint to automatically sort / detect the class of the document based on keywords-based rules. For example, you can define rules to find which vendor provided the document to find which template to apply accordingly.
Parameters
url
required. URL to the source file. Supports links from Google Drive, Dropbox and from built-in PDF.co files storage. For uploading files via API please check Files Upload section. If you are randomly gettingToo Many Requests
orAccess Denied
error for your input url, please try to addcache:
to enable built-in url caching. You can also encrypt data for output files and decrypt data input files with user-controlled data encryption (uses strongAES
encryption with your own keys). Click here to learn more.httpusername
(optional) - http auth user name if required to access sourceurl
.httppassword
(optional) - http auth password if required to access sourceurl
.pages
optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts at0
(zero). To set a range use the dash-
, for example:0,2-5,7-
. To set a range from index to the last page use range like this:2-
(from page #3 as the index starts at zero and till the of the document). For ALL pages just leave this param empty. Example:0,2-5,7-
means first page, then 3rd page to 6th page, and then the range from 8th (index =7
) page till the end of the document. Must be a String.unwrap
optional. Unwrap lines into a single line within table cells whenlineGrouping
is enabled. Must be one of:true
,false
.rect
optional. Defines coordinates for extraction, e.g.51.8, 114.8, 235.5, 204.0
. You can use PDF.co PDF Viewer with coordinates to easily select and copy coordinates. Must be a String.lang
optional. Sets language for OCR (text from image) to use for scanned PDF, PNG, JPG documents input when extracting text. Default is “eng”. Other languages are also supported:deu
,spa
,chi_sim
,jpn
and many others (full list of supported OCR languages is here. You can also use 2 languages simultaneously like this:eng+deu
orjpn+kor
(any combination).inline
optional. Must be one of:true
to return data as inline orfalse
to return link to output file (default).lineGrouping
optional. optional. Line grouping within table cells. Set to1
to enable the grouping. Must be a String.async
optional. Runs processing asynchronously. ReturnsJobId
that you may use with/job/check
to check state of the background job (possible states:working
,failed
,aborted
andsuccess
). Must be one of:true
,false
.name
optional. File name for generated output. Must be a String.expiration
(optional). Output link expiration in minutes. Default is60
(i.e. 60 minutes or 1 hour). After this delay generated output file(s) (if any) will be auto-removed from PDF.co temporary files storage. Max allowed expiration period depends on your current subscription plan. To store permanent input files (e.g. re-usable images, pdf, documents), please use PDF.co built-in Files Storage instead.profiles
optional. Must be a String. You can set additional and extra options using this parameter that allows you to set custom configuration. For example, to change the CSV separator for PDF to CSV, set this property to the following string:{ 'CSVSeparatorSymbol': ';' }
. See profiles samples for more examples.- Method: POST
- URL: /v1/pdf/convert/to/xml
Query parameters
No query parameters accepted.
Body payload
{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-xml/sample.pdf",
"async": false
}
Example responses
/pdf/convert/to/xml
{
"body": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<document>\r\n <page index=\"0\">\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"24.0\" fontStyle=\"Bold\" color=\"#538DD3\" x=\"36.00\" y=\"34.44\" width=\"242.81\" height=\"24.00\">Your Company Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"76.94\" width=\"66.62\" height=\"11.04\">Your Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"91.46\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"461.02\" y=\"115.94\" width=\"98.42\" height=\"11.04\">Invoice No. 123456</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"436.54\" y=\"130.46\" width=\"122.90\" height=\"11.04\">Invoice Date 01/01/2016</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"154.94\" width=\"63.62\" height=\"11.04\">Client Name</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"169.70\" width=\"40.34\" height=\"11.04\">Address</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"184.22\" width=\"69.14\" height=\"11.04\">City, State Zip</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"233.30\" width=\"28.70\" height=\"11.04\">Notes</text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"36.00\" y=\"316.25\" width=\"22.58\" height=\"11.04\">Item</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"247.61\" y=\"316.25\" width=\"44.64\" height=\"11.04\">Quantity</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"398.95\" y=\"316.25\" width=\"26.91\" height=\"11.04\">Price</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"533.14\" y=\"316.25\" width=\"26.30\" height=\"11.04\">Total</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"341.33\" width=\"30.62\" height=\"11.04\">Item 1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"341.33\" width=\"6.12\" height=\"11.04\">1</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"341.33\" width=\"27.51\" height=\"11.04\">40.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"341.33\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"362.45\" width=\"30.62\" height=\"11.04\">Item 2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"362.45\" width=\"6.12\" height=\"11.04\">2</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"362.45\" width=\"27.51\" height=\"11.04\">30.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"362.45\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"383.57\" width=\"30.62\" height=\"11.04\">Item 3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"383.57\" width=\"6.12\" height=\"11.04\">3</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"383.57\" width=\"27.51\" height=\"11.04\">20.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"383.57\" width=\"27.50\" height=\"11.04\">60.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"36.00\" y=\"404.93\" width=\"30.62\" height=\"11.04\">Item 4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"286.13\" y=\"404.93\" width=\"6.12\" height=\"11.04\">4</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"398.35\" y=\"404.93\" width=\"27.51\" height=\"11.04\">10.00</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" x=\"531.94\" y=\"404.93\" width=\"27.50\" height=\"11.04\">40.00</text>\r\n </column>\r\n </row>\r\n <row>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text>\r\n </text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"389.11\" y=\"425.83\" width=\"36.75\" height=\"11.04\">TOTAL</text>\r\n </column>\r\n <column>\r\n <text fontName=\"Arial\" fontSize=\"11.0\" fontStyle=\"Bold\" x=\"525.82\" y=\"425.83\" width=\"33.62\" height=\"11.04\">200.00</text>\r\n </column>\r\n </row>\r\n </page>\r\n</document>",
"pageCount": 1,
"error": false,
"status": 200,
"name": "sample.xml",
"remainingCredits": 60563
}
Code Snippet
CURL
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/xml' \
--header 'x-api-key: ' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-xml/sample.pdf",
"async": false
}'
Samples
- AWS Lambda - Convert PDF To XML From URL (Node.js)
- C# - Advanced Conversion Options
- C# - Advanced Conversion Options With Rotated Input
- C# - Convert PDF To XML From URL
- C# - Convert PDF To XML From URL Asynchronously
- C# - Convert PDF To XML From Uploaded File
- Java - Advanced Conversion Options
- Java - Advanced Conversion Options With Rotated Input
- Java - Convert PDF To XML From URL
- Java - Convert PDF To XML From Uploaded File
- JavaScript - Advanced Conversion Options
- JavaScript - Advanced Conversion Options With Rotated Input
- JavaScript - Convert PDF To XML From URL (Node.js)
- JavaScript - Convert PDF To XML From URL (Node.js) - Async API
- JavaScript - Convert PDF To XML From Uploaded File (Node.js)
- JavaScript - Convert PDF To XML From Uploaded File (Node.js) - Async API
- JavaScript - Convert PDF To XML in JQuery
- JavaScript - Convert PDF To XML in JQuery - Async API
- PHP - Convert PDF To XML Asynchronously
- PHP - Convert PDF To XML From Uploaded File
- PowerShell - Advanced Conversion Options
- PowerShell - Advanced Conversion Options With Rotated Input
- PowerShell - Convert PDF To XML From URL
- PowerShell - Convert PDF To XML From URL Asynchronously
- PowerShell - Convert PDF To XML From Uploaded File
- Python - Advanced Conversion Options
- Python - Advanced Conversion Options With Rotated Input
- Python - Convert PDF To XML From Uploaded File
- Python - Convert PDF To XML From Uploaded File Asynchronously
- VB.NET - Advanced Conversion Options
- VB.NET - Advanced Conversion Options With Rotated Input
- VB.NET - Convert PDF To XML From URL
- VB.NET - Convert PDF To XML From URL Asynchronously
- VB.NET - Convert PDF To XML From Uploaded File
- cURL - Convert PDF To XML
Copyright © 2016 - 2023 PDF.co