Skip to main content

A set of tools for data mining (OCR-processed) PDFs

Project description

This repository contains a set of tools written in Python 3 with the aim to extract tabular data from scanned and OCR-processed documents available as PDF files. Before these files can be processed they need to be converted to XML files in pdf2xml format using poppler utils. Further information and examples can be found in the github repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdftabextract-0.1.1.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

pdftabextract-0.1.1-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file pdftabextract-0.1.1.tar.gz.

File metadata

  • Download URL: pdftabextract-0.1.1.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pdftabextract-0.1.1.tar.gz
Algorithm Hash digest
SHA256 331d86827b3a67ec9eca8774be1c67da168c8c9129d459972c9fce3d196523ea
MD5 89e2cd93eaeb2c07f1cf71b2fe65574e
BLAKE2b-256 620b5a1a81d8188590d04199561f3a8c792c2c81781d7fe5c95dda0d09dad2e6

See more details on using hashes here.

File details

Details for the file pdftabextract-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pdftabextract-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4a6423d760ccfc8d5e9adf8682956f81ea657b2dd18d9cc73a3554f685db208
MD5 c8d0987c51fd24d5615b03a715c22fa6
BLAKE2b-256 a83ba419afb3d179c9e15d3c76faffac7df3103913cc2e7550b1ad8f40badcfe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page