Skip to main content

A set of tools for data mining (OCR-processed) PDFs

Project description

This repository contains a set of tools written in Python 3 with the aim to extract tabular data from scanned and OCR-processed documents available as PDF files. Before these files can be processed they need to be converted to XML files in pdf2xml format using poppler utils. Further information and examples can be found in the github repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftabextract-0.3.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

pdftabextract-0.3.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file pdftabextract-0.3.0.tar.gz.

File metadata

File hashes

Hashes for pdftabextract-0.3.0.tar.gz
Algorithm Hash digest
SHA256 822bc899123f360bd83d32f830c7d1fc4db16240f84eedb3009ff12a2d8a97e9
MD5 4199c31bf926e7a830ac87138f0c56e1
BLAKE2b-256 cbb49c47e9a73262f7155fdc94334d44e3f8a39c54be71ce0e6525feb2494176

See more details on using hashes here.

File details

Details for the file pdftabextract-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdftabextract-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88ec8c4481d4de2bb5f675732e751c10cc31c5908545cbad011f3e0d40654f3c
MD5 cf361171526695a4d0f45020344ec317
BLAKE2b-256 1ea9dcf92e41100ba949e33ff7dc47ac8f6e905c5ed1890e6113eb0abd263f40

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page