Read the merlin-py tutorials for use...
Project description
Merlin-py is a potential evolution of camelot ( /atlanhq/camelot ) and tabula ( /tabulapdf/tabula ) to furthur simplify the extraction of tabular data from PDFs. Merlin-py aims to simplify data extraction by providing a search logic to users of the package for data extraction, whether that be a table label, set of column names, or table dimensions. An ideal use case is for the extraction of a common data table across many hundreds, or thousands of separate pdfs where the desired data is in different locations on each document.. (Tax documents, historical records, etc)
Current compatibility: Linux
Future compatibility: Windows, Mac
Linux Software requirements: bundled python packages, python-cv (debian)
Windows Software requirements: TBD
Mac Software requirements: TBD
Future features: Tesseract based image -> machine readable pdf conversion, runtime performance improvements, GUI frontend.
Big thanks to the developers and contributors of both tabula-py and camelot-py as this project is largely built atop these two other efforts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for merlin_py-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3909d3ce550328f0325af117923c241c66bc096a4b67b6ebc1b3be99ccbf0363 |
|
MD5 | bcecbe31f753ae8797ac3ca09c91f247 |
|
BLAKE2b-256 | 51c9fd69159bad7e997d50231433c7cd5f31d0dcdefddceb1cfedcbf4b793b55 |