Link Search Menu Expand Document

PDF.co Document Parser: Template Creation Guide

Table of Contents:

What is Document Parser and How It Works

Document Parser is the versatile document parsing engine for accurate and easy data extraction from PDF and scanned documents. Create and maintain extraction templates without coding!

Extracts data from invoices, statements, reports, paystubs, tables, reciepts. Supports both native electronic and scanned PDF files, PNG, JPG, TIFF images. Supports English, German, French, Spanish and many other languages including dual language documents. Available via Web API, Zapier, Integromat, SalesForce and via self-hosted on-premise API server.

Visual Document Parser Template Editor

(Online version of PDF.co Document Parser templates editor is here. Create, test and maintain data extraction templates.

Template Objects

Template objects define objects to extract from input document. These can be:

  1. Field mapped from Virtual Grid - extracts value from virtual grid generated by the engine for input document. If you have documents with the same layout, this object is useful. Basically, it is similar to converting document to a spreadsheet and then telling the engine to get value from a virtual cell at (row, column).
  2. Field from Rectangle Selection - extracts data from a rectangle selection by coordinates. Use it when you have document with objects placed at the very same place all the time. You can optionally set expression (with macros) to additionally locate a matching expression (for example, a date or currency) inside a rectangle selection.
  3. Field from auto Key-Value - runs a search for key-value pairs in the document and generates output objects as Key-value pairs. you should define expression with macros).
  4. Table from Rectangle - reads a table from a rectangle.
  5. Field based on Text Search - runs a text search on a whole page searching for a predefined macros (for example, you can find a date, currency, SSN, phone number) and returns it as a value/
  6. Table based on Search - finds a table using AI or using search expressions defined by a JSON config containing a text search pattern for beginning and ending pattern for a table. With this approach you can extract multipage tables. Explore more details
  7. Field with Static Value - returns a static value. Useful if you need to generate some predefined values, like a template name, or company name and return a predefined value along with other objects.

Expression parameter (for field mapped from a rectangle, text search based field, fied from auto key-value and others)

Expression parameter can contain:

  1. Macros (see the list here )
  2. Regular Expressions (don’t forget to enable regex checkbox)
  3. Mixed macros and regular expressions (don’t forget to enable regex checkbox)
  4. Special Functions for AI-powered data extraction of specific values like a company name. The list of available special functions is available here.

Special Markers (for use inside expressions)**

You can use special markers inside expression parameter. Marker helps to point a specific part of expression to become output value or a field name (otherwise a whole expression is used):

  • ?<value> marker points to the regular expression group that must be used as a final value for the object. Example: Invoice ?<value>(\d+) will extract 12345 as final field value from Invoice 12345 string.
  • ?<key> marker points to the regular expression group that must be used a field name for the object. Important: multiple matches for this expression will auto-generate multiple objects for the output. Example: (?<key>{{SentenceWithSingleSpaces}}): (?<value>{{SentenceWithSingleSpaces}}) will extract key1: value, key2: value as two separate objects named key1 and key2 accordingly.

Search-based Table Object

This object is defined by a JSON-based configuration.

Table objects return tabular data you need to extract. Table objects can be defined by

  1. rectangle coordinates (use Table from Rectangle type of object)
  2. AI-powered automated table detection (automatically finds tables on pages, using Table from Auto Detection)
  3. finds table by the set of rules using text or regular expressions search that defines the table’s start, the end, and rows (using Table from Search)

You can also define multitple table types inside JSON based Tables section inside this object configuration.

Table parameters (tableProperties object):

  • name - table name to distinguish different tables in the result.

  • autoDetection object [optional] - defines auto detection mode to use AI based table detection. IMPORTANT: when this section is set and tableIndex is not -1 then other params like start, end, row are ignored because table is auto detected.
    • pageIndex [required] - sets index of the page to find table on (starts at 0 (zero)).
    • tableIndex [required] - -1 by default (means that auto detection is disabled by default. Set to 0 or higher index so it will detect a table on given page, starting from top to bottom. IMPORTANT: when this section is set and tableIndex is not -1 then other params like start, end, row are ignored because table is auto detected.
  • start - group of parameters that define the start of the table:
    • expression - macro expression to find the start of the table, or
    • y - the top coordinate of the table. You can find PDF points coordinates in your PDF file using our Simplified PDF Viewer.
    • pageIndex - index of the page containing the y coordinate.
    • regex - indicates if the expression parameter contains regular expression.
  • end object - group of parameters that define the end of the table:
    • expression - macro expression to find the end of the table, or
    • y - the bottom coordinate of the table. You can find PDF points coordinates in your PDF file using our Simplified PDF Viewer.
    • regex - indicates if the expression parameter contains regular expression.
  • subItemStart object - [optional] parameters that define the start of the table sub-item. Sub-items are used for tables with complex multiline rows:
    • expression - macro expression to find the start of the sub-item.
    • regex - indicates if the expression parameter contains regular expression.
  • subItemEnd object - [optional] parameters that define the end of the table sub-item:
    • expression - macro expression to find the end of the sub-item.
    • regex - indicates if the expression parameter contains regular expression.
  • introduction object - Parameters to parse values from sub-headers. Values parsed from the introduction expression will be repeated in the beginning of every row.
    • expression - macro expression to parse introduction items.
    • regex - indicates if the expression parameter contains regular expression.
  • row object - [optional] group of parameters that define table rows:
    • expression - the main macro expression to find a row. Named groups in this expression will go to the result table as columns. See example below.
    • regex - indicates if expression contains regular expression.
    • subExpression1, subExpression2, subExpression3, subExpression4, subExpression5 - additional expressions to parse some remaining parts of row data which the main expression cannot parse in one pass. Sub-expressions are executed after the main expression for the text chunks between matches of the main expression. Can be used to parse hanging rows (wrapped multiline cells).
  • columns array - [optional] array that defines column properties. Names of columns should correspond to the names of the capturing groups of the row expression. Column properties:
    • name - defines column name.
    • x - [optional] X coordinate of the left column edge in PDF Points. You can find PDF points coordinates in your PDF file using our Simplified PDF Viewer.
    • type - [optional] defines column data type. Should be one of these values:
      • string
      • integer
      • date
      • decimal
      • for more see also the types descriptions in fields section.
    • dateFormat - [optional] See dateFormat description in fields section.
    • outputDateFormat - [optional] See outputDateFormat description in fields section.
    • coalesceWith - [optional] Name of column to merge the parsed value with.
  • rowMergingRule string - [optional] For the fields of rectangle type and table data type. Defines the rule to merge multiline data in table cells. Supported values:
    • none - default, no rule.
    • byBorders - combine lines within a table cell framed by border lines.
    • hangingRows - join table row that contains only a single cell up to the previous row if there is no separating line between them. Useful for tables without borders between rows.
  • multipage boolean - [optional] defines whether the table may continue on further pages.
  • horizontalSeparationOffset - offset from the table start to the beginning of the first table row.
  • horizontalSeparationStep - row height. These two parameters help the parser distinguish rows in tables without horizontal separators. This works only with tables with fixed row height.

Example of table parsing:

DescriptionIntervalQuantityAmount ($)
Basic PlanJan 1 - Jan 31125.00
Basic PlanFeb 1 - Feb 28125.00
  Total in USD:50.00

The table above, can be parsed with macro expressions or with explicitly defined column coordinates.

  1. Extracting table using AI powered auto detector:

Full template:

autoDetectTableField value:

{
  "autoDetection": {
    "pageIndex": 0,
    "tableIndex": 0
  },
  "columns": [
    {
      "name": "description",
      "type": "string"
    },
    {
      "name": "interval",
      "type": "string"
    },
    {
      "name": "quantity",
      "type": "integer"
    },
    {
      "name": "amount",
      "type": "decimal"
    }
  ]
}

Full template:


{
  "templateVersion": 4,
  "templatePriority": 0,
  "culture": "en-US",
  "objects": [],
  "templateName": "",
  "options": {
    "ocrMode": "auto",
    "ocrLanguage": "eng"
  },
  "objects": [
    {
      "name": "AutoDetectTable",
      "objectType": "table",
      "tableProperties": {
        "autoDetection": {
          "pageIndex": 0,
          "tableIndex": 0
        },
        "columns": [
          {
            "name": "description",
            "type": "string"
          },
          {
            "name": "interval",
            "type": "string"
          },
          {
            "name": "quantity",
            "type": "integer"
          },
          {
            "name": "amount",
            "type": "decimal"
          }
        ]
      }
    }
  ]
}


2. Extracting table using markers defines by macros:

`searchBasedTable` object properties:

```JSON
{
  "start": {
    "expression": "Amount{{Space}}{{OpeningParenthesis}}{{Dollar}}{{ClosingParenthesis}}"
  },
  "end": {
    "expression": "Total in USD"
  },
  "row": {
    "expression": "{{LineStart}}{{Spaces}}(?<description>{{SentenceWithSingleSpaces}})(?<interval>{{3Letters}}{{Space}}{{Digits}}{{Space}}{{Minus}}{{Space}}{{3Letters}}{{Space}}{{Digits}}){{Spaces}}(?<quantity>{{Digits}}){{Spaces}}(?<amount>{{Number}})",
    "regex": true
  },
  "columns": [
    {
      "name": "description",
      "type": "string"
    },
    {
      "name": "interval",
      "type": "string"
    },
    {
      "name": "quantity",
      "type": "integer"
    },
    {
      "name": "amount",
      "type": "decimal"
    }
  ]
}

Full Template:


{
  "templateVersion": 4,
  "templatePriority": 0,
  "culture": "en-US",
  "objects": [],
  "templateName": "",
  "options": {
    "ocrMode": "auto",
    "ocrLanguage": "eng"
  },
  "objects": [
    {
      "name": "searchBasedTable1",
      "objectType": "table",
      "tableProperties": {
        "start": {
          "expression": "Amount{{Space}}{{OpeningParenthesis}}{{Dollar}}{{ClosingParenthesis}}"
        },
        "end": {
          "expression": "Total in USD"
        },
        "row": {
          "expression": "{{LineStart}}{{Spaces}}(?<description>{{SentenceWithSingleSpaces}})(?<interval>{{3Letters}}{{Space}}{{Digits}}{{Space}}{{Minus}}{{Space}}{{3Letters}}{{Space}}{{Digits}}){{Spaces}}(?<quantity>{{Digits}}){{Spaces}}(?<amount>{{Number}})",
          "regex": true
        },
        "columns": [
          {
            "name": "description",
            "type": "string"
          },
          {
            "name": "interval",
            "type": "string"
          },
          {
            "name": "quantity",
            "type": "integer"
          },
          {
            "name": "amount",
            "type": "decimal"
          }
        ]
      }
    }
  ]
}

APPENDIX 1: Macros

Built-in macros:

MacroDescription
{{SmartDate}}Tries to detect the date in the most common formats.
{{Number}}Decimal number like the following: “12.34”, “-123,456.78”, “123.456”. Decimal separator and thousands separator are automatically taken from the template culture.
{{Money}}Decimal number with currency symbol like the following: “USD 12.34”, “$123,456.78”, “123.45 €”. Decimal separator and thousands separator are automatically taken from the template culture.
{{USPhoneNumber}}Tries to detect US phone number.
{{Space}}Single space.
{{Spaces}}One or more spaces.
{{2Spaces}}Two spaces.
{{3Spaces}}Three spaces.
{{4Spaces}}Four spaces.
{{5Spaces}}Five spaces.
{{6Spaces}}Six spaces.
{{7Spaces}}Seven spaces.
{{8Spaces}}Eight spaces.
{{9Spaces}}Nine spaces.
{{10Spaces}}Ten spaces.
{{Digit}}One digit.
{{Digits}}One or more digits.
{{2Digits}}Two digits.
{{3Digits}}Three digits.
{{4Digits}}Four digits.
{{5Digits}}Five digits.
{{6Digits}}Six digits.
{{7Digits}}Seven digits.
{{8Digits}}Eight digits.
{{9Digits}}Nine digits.
{{10Digits}}Ten digits.
{{DigitOrSymbol}}One digit or symbol (“_-+=/”).
{{DigitsOrSymbols}}One or more digits or symbols (“_-+=/”).
{{2DigitsOrSymbols}}Two digits or symbols (“_-+=/”).
{{3DigitsOrSymbols}}Three digits or symbols (“_-+=/”).
{{4DigitsOrSymbols}}Four digits or symbols (“_-+=/”).
{{5DigitsOrSymbols}}Five digits or symbols (“_-+=/”).
{{6DigitsOrSymbols}}Six digits or symbols (“_-+=/”).
{{7DigitsOrSymbols}}Seven digits or symbols (“_-+=/”).
{{8DigitsOrSymbols}}Eight digits or symbols (“_-+=/”).
{{9DigitsOrSymbols}}Nine digits or symbols (“_-+=/”).
{{10DigitsOrSymbols}}Ten digits or symbols (“_-+=/”).
{{Letter}}One letter from any language.
{{Letters}}One or more letters from any language.
{{2Letters}}Two letters from any language.
{{3Letters}}Three letters from any language.
{{4Letters}}Four letters from any language.
{{5Letters}}Five letters from any language.
{{6Letters}}Six letters from any language.
{{7Letters}}Seven letters from any language.
{{8Letters}}Eight letters from any language.
{{9Letters}}Nine letters from any language.
{{10Letters}}Ten letters from any language.
{{UppercaseLetter}}One uppercase letter from any language.
{{UppercaseLetters}}One or more uppercase letters from any language.
{{2UppercaseLetter}}Two uppercase letters from any language.
{{3UppercaseLetter}}Three uppercase letters from any language.
{{4UppercaseLetter}}Four uppercase letters from any language.
{{5UppercaseLetter}}Five uppercase letters from any language.
{{6UppercaseLetter}}Six uppercase letters from any language.
{{7UppercaseLetter}}Seven uppercase letters from any language.
{{8UppercaseLetter}}Eight uppercase letters from any language.
{{9UppercaseLetter}}Nine uppercase letters from any language.
{{10UppercaseLetter}}Ten uppercase letters from any language.
{{LetterOrDigit}}One letter or digit.
{{LettersOrDigits}}One or more letters or digits.
{{2LettersOrDigits}}Two letters or digits.
{{3LettersOrDigits}}Three letters or digits.
{{4LettersOrDigits}}Four letters or digits.
{{5LettersOrDigits}}Five letters or digits.
{{6LettersOrDigits}}Six letters or digits.
{{7LettersOrDigits}}Seven letters or digits.
{{8LettersOrDigits}}Eight letters or digits.
{{9LettersOrDigits}}Nine letters or digits.
{{10LettersOrDigits}}Ten letters or digits.
{{UppercaseLetterOrDigit}}One uppercase letter or digit.
{{UppercaseLettersOrDigits}}One or more uppercase letters or digits.
{{2UppercaseLettersOrDigits}}Two uppercase letters or digits.
{{3UppercaseLettersOrDigits}}Three uppercase letters or digits.
{{4UppercaseLettersOrDigits}}Four uppercase letters or digits.
{{5UppercaseLettersOrDigits}}Five uppercase letters or digits.
{{6UppercaseLettersOrDigits}}Six uppercase letters or digits.
{{7UppercaseLettersOrDigits}}Seven uppercase letters or digits.
{{8UppercaseLettersOrDigits}}Eight uppercase letters or digits.
{{9UppercaseLettersOrDigits}}Nine uppercase letters or digits.
{{10UppercaseLettersOrDigits}}Ten uppercase letters or digits.
{{LetterOrDigitOrSymbol}}One letter, or digit, or symbol (“_-+=/”).
{{LettersOrDigitsOrSymbols}}One or more letters, or digits, or symbols (“_-+=/”).
{{2LettersOrDigitsOrSymbols}}Two letters, or digits, or symbols (“_-+=/”).
{{3LettersOrDigitsOrSymbols}}Three letters, or digits, or symbols (“_-+=/”).
{{4LettersOrDigitsOrSymbols}}Four letters, or digits, or symbols (“_-+=/”).
{{5LettersOrDigitsOrSymbols}}Five letters, or digits, or symbols (“_-+=/”).
{{6LettersOrDigitsOrSymbols}}Six letters, or digits, or symbols (“_-+=/”).
{{7LettersOrDigitsOrSymbols}}Seven letters, or digits, or symbols (“_-+=/”).
{{8LettersOrDigitsOrSymbols}}Eight letters, or digits, or symbols (“_-+=/”).
{{9LettersOrDigitsOrSymbols}}Nine letters, or digits, or symbols (“_-+=/”).
{{10LettersOrDigitsOrSymbols}}Ten letters, or digits, or symbols (“_-+=/”).
{{UppercaseLetterOrDigitOrSymbol}}One uppercase letter, or digit, or symbol (“_-+=/”).
{{UppercaseLettersOrDigitsOrSymbols}}One or more uppercase letters, or digits, or symbols (“_-+=/”).
{{2UppercaseLettersOrDigitsOrSymbols}}Two uppercase letters, or digits, or symbols (“_-+=/”).
{{3UppercaseLettersOrDigitsOrSymbols}}Three uppercase letters, or digits, or symbols (“_-+=/”).
{{4UppercaseLettersOrDigitsOrSymbols}}Four uppercase letters, or digits, or symbols (“_-+=/”).
{{5UppercaseLettersOrDigitsOrSymbols}}Five uppercase letters, or digits, or symbols (“_-+=/”).
{{6UppercaseLettersOrDigitsOrSymbols}}Six uppercase letters, or digits, or symbols (“_-+=/”).
{{7UppercaseLettersOrDigitsOrSymbols}}Seven uppercase letters, or digits, or symbols (“_-+=/”).
{{8UppercaseLettersOrDigitsOrSymbols}}Eight uppercase letters, or digits, or symbols (“_-+=/”).
{{9UppercaseLettersOrDigitsOrSymbols}}Nine uppercase letters, or digits, or symbols (“_-+=/”).
{{10UppercaseLettersOrDigitsOrSymbols}}Ten uppercase letters, or digits, or symbols (“_-+=/”).
{{Dollar}}Dollar sign ($).
{{Euro}}Euro sign (€).
{{Pound}}Pound sign (£).
{{Yen}}Yen sign (¥).
{{Yuan}}Yuan sign (¥).
{{CurrencySymbol}}Any currency symbol ($, €, £, ¥, etc.)
{{Dot}}Single dot symbol (“.”).
{{Comma}}Single comma symbol (“,”).
{{Colon}}Single colon symbol (“:”).
{{Semicolon}}Single semicolon symbol (“;”).
{{Minus}}Single minus (dash, hyphen) symbol (“-“).
{{Slash}}Slash symbol (“/”).
{{Backslash}}Backslash symbol (“").
{{Percent}}Percent symbol (“%”).
{{LineStart}}Start of line (virtual symbol).
{{LineEnd}}End of line (virtual symbol).
{{SentenceWithSingleSpaces}}Single-space-separated sequence of words and symbols. Breaks on double space.
{{SentenceWithDoubleSpaces}}Extended {{SentenceWithSingleSpaces}} macro allowing two spaces between words. Breaks on triple space.
{{EndOfPage}}End of page or end of document.
{{WordBoundary}}Start or end of word (virtual symbol).
{{OpeningCurlyBrace}}Opening curly brace symbol (“{“).
{{ClosingCurlyBrace}}Closing curly brace symbol (“}”).
{{OpeningParenthesis}}Opening parenthesis symbol (“(“).
{{ClosingParenthesis}}Closing parenthesis symbol (“)”).
{{OpeningSquareBracket}}Opening square bracket symbol (“[”).
{{ClosingSquareBracket}}Closing square bracket symbol (“]”).
{{OpeningAngleBracket}}Opening angle bracket symbol (“<”).
{{ClosingAngleBracket}}Closing angle bracket symbol (“>”).
{{DateMM/DD/YY}}Date in format “01/01/19” (with leading zero).
{{DateM/D/YY}}Date in format “1/1/19” (without leading zero).
{{DateMM/DD/YYYY}}Date in format “01/01/2019” (with leading zero).
{{DateM/D/YYYY}}Date in format “1/1/2019” (without leading zero).
{{DateMM-DD-YY}}Date in format “01-01-19” (with leading zero).
{{DateM-D-YY}}Date in format “1-1-19” (without leading zero).
{{DateMM-DD-YYYY}}Date in format “01-01-2019” (with leading zero).
{{DateM-D-YYYY}}Date in format “1-1-2019” (without leading zero).
{{DateMM.DD.YY}}Date in format “01.01.19” (with leading zero).
{{DateM.D.YY}}Date in format “1.1.19” (without leading zero).
{{DateMM.DD.YYYY}}Date in format “01.01.2019” (with leading zero).
{{DateM.D.YYYY}}Date in format “01.01.2019” (without leading zero).
{{DateDD/MM/YY}}Date in format “01/01/19” (with leading zero).
{{DateD/M/YY}}Date in format “1/1/19” (without leading zero).
{{DateDD/MM/YYYY}}Date in format “01/01/2019” (with leading zero).
{{DateD/M/YYYY}}Date in format “1/1/2019” (without leading zero).
{{DateDD-MM-YY}}Date in format “01-01-19” (with leading zero).
{{DateD-M-YY}}Date in format “1-1-19” (without leading zero).
{{DateDD-MM-YYYY}}Date in format “01-01-2019” (with leading zero).
{{DateD-M-YYYY}}Date in format “1-1-2019” (without leading zero).
{{DateDD.MM.YY}}Date in format “01.01.19” (with leading zero).
{{DateD.M.YY}}Date in format “1.1.19” (without leading zero).
{{DateDD.MM.YYYY}}Date in format “01.01.2019” (with leading zero).
{{DateD.M.YYYY}}Date in format “1.1.2019” (without leading zero).
{{DateYYYYMMDD}}Date in format “20190101”.
{{DateYYYY/MM/DD}}Date in format “2019/01/01” (with leading zero).
{{DateYYYY/M/D}}Date in format “2019/1/1” (without leading zero).
{{DateYYYY-MM-DD}}Date in format “2019-01-01” (with leading zero).
{{DateYYYY-M-D}}Date in format “2019-1-1” (without leading zero).
{{Anything}}Any characters up to the next macro in the expression.
{{AnythingGreedy}}Any characters up to the next macro in the expression or to the end of line. Greedy version.
{{ToggleSingleLineMode}}Enables or disables single-line mode. In single-line mode, {{Anything}} and {{AnythingGreedy}} macros do not stop at the end of the line and proceed to the next line of text.
{{ToggleCaseInsensitiveMode}}Enables or disables case-insensitive mode.

Special Functions

You can also insert so called special function which looks like this: $$functionName. Special fucntions are created for AI-powered value detection, like a company name, max number in a whole document, max date or even finding and decoding QR Code barcode value inside document.

All special functions are listed here

APPENDIX 2: Sample templates

Sample 1

Sample document text:

    DigitalOcean
    101 Avenue of the Americas, 10th Floor
    New York, NY 10013
                                                        Date Issued: February 1, 2016
                                                         Period: January 1 - 31, 2016
                                                              Invoice Number: 1234567

        Description                                 Hours     Start          End            USD
        Website-Dev (1GB)                           744       01-01 00:00    01-31 23:59    $10.00
        Website-Live (1GB)                          744       01-01 00:00    01-31 23:59    $10.00
        Database-Live (2GB)                         744       01-01 00:00    01-31 23:59    $20.00
        Tasks-Dev (1GB)                             744       01-01 00:00    01-31 23:59    $10.00
                                                                                     Total: $50.00

     Bill To:
     Samee Sikka <admin@meee.org>
     meee.org
     Gouran

        If you have a credit card on file it will be automatically charged within 24 hours.

Sample template (YAML):

{
  "templateVersion": 4,
  "templatePriority": 0,
  "templateName": "DigitalOcean Invoice",
  "objects": [
    {
      "name": "companyName",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "static",
        "expression": "DigitalOcean"
      }
    },
    {
      "name": "invoiceId",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Invoice Number: ({{Digits}})",
        "regex": true
      }
    },
    {
      "name": "dateIssued",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Date Issued: ({{SmartDate}})",
        "dataType": "date",
        "dateFormat": "auto-mdy"
      }
    },
    {
      "name": "total",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "macros",
        "expression": "Total: {{Dollar}}({{Number}})",
        "dataType": "decimal"
      }
    },
    {
      "name": "currency",
      "objectType": "field",
      "fieldProperties": {
        "fieldType": "static",
        "expression": "USD"
      }
    },
    {
      "name": "table1",
      "objectType": "table",
      "tableProperties": {
        "start": {
          "expression": "Description{{Spaces}}Hours"
        },
        "end": {
          "expression": "Total:"
        },
        "row": {
          "expression": "{{LineStart}}{{Spaces}}(?<description>{{SentenceWithSingleSpaces}}){{Spaces}}(?<hours>{{Digits}}){{Spaces}}(?<start>{{2Digits}}{{Minus}}{{2Digits}}{{Space}}{{2Digits}}{{Colon}}{{2Digits}}){{Spaces}}(?<end>{{2Digits}}{{Minus}}{{2Digits}}{{Space}}{{2Digits}}{{Colon}}{{2Digits}}){{Spaces}}{{Dollar}}(?<unitPrice>{{Number}})",
          "regex": true
        },
        "columns": [
          {
            "name": "hours",
            "type": "integer"
          },
          {
            "name": "unitPrice",
            "type": "decimal"
          }
        ]
      }
    }
  ]
}

Result (JSON):

{
  "templateName": "DigitalOcean Invoice",
  "templateVersion": "4",
  "objects": [
    {
      "name": "companyName",
      "objectType": "field",
      "value": "DigitalOcean"
    },
    {
      "name": "invoiceId",
      "objectType": "field",
      "value": "1234567",
      "pageIndex": 0,
    },
    {
      "name": "dateIssued",
      "objectType": "field",
      "value": "2016-02-01T00:00:00",
      "pageIndex": 0,
    },
    {
      "name": "total",
      "objectType": "field",
      "value": 50.00,
      "pageIndex": 0,
    },
    {
      "name": "currency",
      "objectType": "field",
      "value": "USD"
    },
    {
      "name": "table1",
      "objectType": "table",
      "rows": [
        {
          "description": {
            "value": "Website-Dev (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Website-Live (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Database-Live (2GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 20.00,
            "pageIndex": 0
          }
        },
        {
          "description": {
            "value": "Tasks-Dev (1GB)",
            "pageIndex": 0
          },
          "hours": {
            "value": 744,
            "pageIndex": 0
          },
          "start": {
            "value": "01-01 00:00",
            "pageIndex": 0
          },
          "end": {
            "value": "01-31 23:59",
            "pageIndex": 0
          },
          "unitPrice": {
            "value": 10.00,
            "pageIndex": 0
          }
        }
      ]
    }
  ]
}

Copyright (c) 2018-2022 ByteScout, Inc.

PDF.co (on-demand platform) with Document Parser

ByteScout (on-prem tools)