dolphinlooki.blogg.se - Pdfextractor code

#Pdfextractor code pdf#

'Remover2': making unsearchable now performed only for edited pages.

#Pdfextractor code pdf#

'Remover2': fixed handling of PDF page rotation. It replaced unicode spaces and hyphens in the extracted text with normal ' ' and '-' characters. + Added property 'NormalizeText' to all extractors. = Improved background color detection for the option 'ConsiderBackgroundColors'. They helps to prevent underlined text affecting the line grouping in table cells. + Added properties 'DetectUnderlineTextStyle' and 'DetectStrikeoutTextStyle' to `CSVExtractor` and `XLSExtractor`. = All extractor classes now support extraction of page ranges. = 'JSONExtractor' and 'XMLExtractor' now output the page size for each page. See the property 'RenameMatchingFieldsDuringMerge'. Now it can link fields with matching names or rename them to avoid unwanted linking. = 'DocumentMerger': Improved merging of PDF forms. = Improved the 'LineGroupingMode.JoinOrphanedRows'. = Improved filtering of shadow-like text ('ExtractShadowLikeText' option). = Greatly improved tables detection in 'TableDetector2'. + New column detection mode 'ColumnDetectionMode.ContentGroupsAI' that works better on tables without borders and on pages with multiple tables.

Fixed disposing issue in 'SearchablePDFMaker'. Line grouping was not affected by 'ConsiderFontSizes' and 'ConsiderFontColors' properties. NET Core min required version is 2.1 now (was 2.0). + Extractors and SearchablePDFMaker: Added property 'OCRDisableAutoSegmentation' to solve OCR engine's segmentation issues. = Improved COM/ActiveX interfaces for in-memory processing without file operations. + InfoExtractor: Added method 'GetFormFields()' returning information about form fields in PDF document. + Added support for UniKS-UCS2-H text encoding. = JSONExtractor: The mode 'OutputStructure.Full' is renamed to 'OutputStructure.LegacyFixed' and made maximally compatible in field names with the mode 'OutputStructure.Legacy'.

+ XLSExtractor: Added property 'CustomColumnWidths' allowing to specify exact column widths in generated Excel spreadsheet. + DocumentMerger: Added property 'MergedDocumentTitle' allowing to override the title of merged document. 'SearchablePDFMaker': fixed coordinates of transparent text in the output document when the input is an image. Fixed parsing of names of file attachments. Rotated text objects were combined with unrotated ones. = 'DocumentRotator' now can automatically fix rotation of PDF files using OCR. Provides ActiveX interface to use from legacy programming languages (Visual Basic 6, Delphi) and scripting (VBscript, JScript and others) Reads text from scanned PDF documents using OCR (Optical Character Recognition) Searches text inside document with regex support Extracts PDF document information (author, subject, producer etc) Extracts data from whole document page or specified rectangular region Splits and merges PDF files, extracts a single page or range of pages Extracts embedded images, files and attachments from PDF files Extracts data from PDF files in TXT, CSV, XML, XLS, XLSX, JSON formats NET, ASP.NET, ActiveX, Visual Basic 6, Classic ASP, Delphi and others.

PERMISSION_DENIED) Getting Started Step 1: Add it to build.Bytescout PDF Extractor SDK for. WRITE_EXTERNAL_STORAGE) = PackageManager.