Parse From Uploaded File - Powershell
Document Parser sample in Powershell demonstrating ‘Parse From Uploaded File’
MultiPageTable-template1.yml
templateName: Multipage Table Test
templateVersion: 4
templatePriority: 0
detectionRules:
keywords:
- Sample document with multi-page table
objects:
- name: total
objectType: field
fieldProperties:
fieldType: macros
expression: TOTAL{{Spaces}}({{Number}})
regex: true
dataType: decimal
- name: table1
objectType: table
tableProperties:
start:
expression: Item{{Spaces}}Description{{Spaces}}Price
regex: true
end:
expression: TOTAL{{Spaces}}{{Number}}
regex: true
row:
expression: '{{LineStart}}{{Spaces}}(?<itemNo>{{Digits}}){{Spaces}}(?<description>{{SentenceWithSingleSpaces}}){{Spaces}}(?<price>{{Number}}){{Spaces}}(?<qty>{{Digits}}){{Spaces}}(?<extPrice>{{Number}})'
regex: true
columns:
- name: itemNo
dataType: integer
- name: description
dataType: string
- name: price
dataType: decimal
- name: qty
dataType: integer
- name: extPrice
dataType: decimal
multipage: true
ParseFromUploadedFile.ps1
# The authentication key (API Key).
# Get your own by registering at https://app.pdf.co
$API_KEY = "***********************************"
# Source PDF file
$SourceFile = ".\MultiPageTable.pdf"
# Destination JSON file name
$DestinationFile = ".\result.json"
# 1. RETRIEVE THE PRESIGNED URL TO UPLOAD THE FILE.
# * If you already have a direct file URL, skip to the step 3.
# Prepare URL for `Get Presigned URL` API call
$query = "https://api.pdf.co/v1/file/upload/get-presigned-url?contenttype=application/octet-stream&name=" + `
[System.IO.Path]::GetFileName($SourceFile)
$query = [System.Uri]::EscapeUriString($query)
try {
# Execute request
$jsonResponse = Invoke-RestMethod -Method Get -Headers @{ "x-api-key" = $API_KEY } -Uri $query
if ($jsonResponse.error -eq $false) {
# Get URL to use for the file upload
$uploadUrl = $jsonResponse.presignedUrl
# Get URL of uploaded file to use with later API calls
$uploadedFileUrl = $jsonResponse.url
# 2. UPLOAD THE FILE TO CLOUD.
$r = Invoke-WebRequest -Method Put -Headers @{ "x-api-key" = $API_KEY; "content-type" = "application/octet-stream" } -InFile $SourceFile -Uri $uploadUrl
if ($r.StatusCode -eq 200) {
# 3. Parse PDF document by template
// Template text. Use Document Parser (https://pdf.co/document-parser, https://app.pdf.co/document-parser)
# to create templates.
# Read template from file:
$templateContent = [IO.File]::ReadAllText(".\MultiPageTable-template1.yml")
# Prepare URL for `Document Parser` API call
$query = "https://api.pdf.co/v1/pdf/documentparser"
# Content
$Body = @{
"url" = $uploadedFileUrl;
"template" = $templateContent;
}
# Execute request
$jsonResponse = Invoke-RestMethod -Method 'Post' -Headers @{ "x-api-key" = $API_KEY } -Uri $query -Body ($Body|ConvertTo-Json) -ContentType "application/json"
if ($jsonResponse.error -eq $false) {
# Get URL of generated HTML file
$resultFileUrl = $jsonResponse.url;
# Download output file
Invoke-WebRequest -Headers @{ "x-api-key" = $API_KEY } -OutFile $DestinationFile -Uri $resultFileUrl
Write-Host "Generated output file saved as `"$($DestinationFile)`" file."
}
else {
# Display service reported error
Write-Host $jsonResponse.message
}
}
else {
# Display request error status
Write-Host $r.StatusCode + " " + $r.StatusDescription
}
}
else {
# Display service reported error
Write-Host $jsonResponse.message
}
}
catch {
# Display request error
Write-Host $_.Exception
}
run.bat
@echo off
powershell -NoProfile -ExecutionPolicy Bypass -Command "& .\ParseFromUploadedFile.ps1"
echo Script finished with errorlevel=%errorlevel%
pause
PDF.co Web API: the Web API with a set of tools for documents manipulation, data conversion, data extraction, splitting and merging of documents. Includes image recognition, built-in OCR, barcode generation and barcode decoders to decode bar codes from scans, pictures and pdf.
Download Source Code (.zip)
return to the previous page explore Document Parser endpoint
Copyright © 2016 - 2023 PDF.co