Read the merlin-py tutorials for use...
Project description
Merlin-py is a potential evolution of camelot ( /atlanhq/camelot ) and tabula ( /tabulapdf/tabula ) to furthur simplify the extraction of tabular data from PDFs. Merlin-py aims to simplify data extraction by providing a search logic to users of the package for data extraction, whether that be a table label, set of column names, or table dimensions. An ideal use case is for the extraction of a common data table across many hundreds, or thousands of separate pdfs where the desired data is in different locations on each document.. (Tax documents, historical records, etc)
Current compatibility: Linux
Future compatibility: Windows, Mac
Linux Software requirements: bundled python packages, python-cv (debian)
Windows Software requirements: TBD
Mac Software requirements: TBD
Future features: Tesseract based image -> machine readable pdf conversion, runtime performance improvements, GUI frontend.
Big thanks to the developers and contributors of both tabula-py and camelot-py as this project is largely built atop these two other efforts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for merlin_py-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e9f65ff24f68095e205b58c00920d687c790baa76d47505696f3d3ff9ce8307 |
|
MD5 | 627d4ef02ad0aaef77e3cb87ede76633 |
|
BLAKE2b-256 | 2d3e9acdf7e6a350afb7fba3740167b2db25860ac32db8bdaa2b4317a958bb2e |