Link Search Menu Expand Document

How to change the PDF Extractor API default settings in Profiles

We found that removing lines greatly improves recognition. The following filters are preset when you use OCR in any of the PDF Extractor APIs. If these filters are in conflict with your custom filter, you can remove the preset by adding 'OCRImagePreprocessingFilters.Clear()': [] at the beginning of your custom filters.

  'OCRDetectPageRotation': true,
  'OCRMode': 'Auto',
  'OCRImagePreprocessingFilters.AddHorizontalLinesRemover()': [],
  'OCRImagePreprocessingFilters.AddVerticalLinesRemover()': [],
  'CSVSeparatorSymbol': ','

Here’s a sample snippet to remove the preset above and add the OCR Mode Auto Repair Fonts.

  'OCRImagePreprocessingFilters.Clear()': [],
  'OCRMode': 'AutoRepairFonts'