This packages can efficiently measure the text structure recognition capabilities ofn pdftextsplitter
Project description
DeltaTextsplitter package
This package is meant for evaluating the text structure recognition capabilities of the package pdftextsplitter It is under development.
Excel-parser
from deltattextsplitter import documentclass
mydoc.splitter.set_documentname("mydocument")
mydoc.splitter.set_documentpath("/path/to/document/")
mydoc.splitter.set_outputpath("/path/for/writing/")
mydoc.splitter.standard_params()
mydoc.splitter.process()
mydoc.outputpath = "/path/to/my/new/excel/"
mydoc.export_outcomes()
And then you have your output excel. You can also read an excel by:
mydoc.referencepath = "/path/to/my/new/excel/"
mydoc.read_references()
and then compare pandas dataframes mydoc.outcomes and mydoc.references
to calculate KPI's and other comparisons.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deltatextsplitter-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efbc7ab47a5eecae4905d17ef062f3f0b9828e8ab53a59bfce49769d645bd59b |
|
MD5 | c08585d680a74c975138d6dbb25815f7 |
|
BLAKE2b-256 | d90bb216c0a884c47a3be09a539a3501a68752d251764e254377d621d5c7044e |