Skip to main content

Text Mining & Classification Toolkit

Project description

PDFInsight

Text Mining & Classification Toolkit

Extract and categorise text-based PDFs into the following categories

  • table of contents
  • header
  • heading
  • tables
  • content
  • footnote
  • footer
  • page number
  • unsure (text that cannot be categorised)

Example

import pdfinsight
df = pdfinsight.pdf_extractor("sample.pdf")

Installation

pip install PDFInsight

References

https://github.com/pymupdf/PyMuPDF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfinsight-0.0.3.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfinsight-0.0.3-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pdfinsight-0.0.3.tar.gz.

File metadata

  • Download URL: pdfinsight-0.0.3.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5a249f6928cd05c200b21011089bffff366374eb35f4ddefca9df7ba1ad36acd
MD5 cac4ea792fa0480a74f62c8398ee918d
BLAKE2b-256 114a7016ccfe7bcb61e3676244eb822a417096c881b528964cd7b2685c33dc92

See more details on using hashes here.

File details

Details for the file pdfinsight-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pdfinsight-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 00b077e2339f8e6b85bfa86189462fbb5ad54aed14865cfd7d0fa894c1a3e747
MD5 50794ea580b263f5db6b2b3d47a3978c
BLAKE2b-256 8e1355dd3560f5ea7438843b4a65c7eca76002badebd102a44d85d6119fc90b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page