Skip to main content

Text Mining & Classification Toolkit

Project description

PDFInsight

Text Mining & Classification Toolkit

Extract and categorise text-based PDFs into the following categories

  • table of contents
  • header
  • heading
  • tables
  • content
  • footnote
  • footer
  • page number
  • unsure (text that cannot be categorised)

Example

import pdfinsight
df = pdfinsight.pdf_extractor("sample.pdf")

Installation

pip install PDFInsight

References

https://github.com/pymupdf/PyMuPDF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfinsight-0.0.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfinsight-0.0.2-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file pdfinsight-0.0.2.tar.gz.

File metadata

  • Download URL: pdfinsight-0.0.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b4afb06528b225942f286d8267ebf99576a0a53d99b46294c0f2a9beae8a7348
MD5 ff381f7b72e4b731d57d2baf21787cae
BLAKE2b-256 08f128b173a957a8e18821a28b20e1f45d0e6a24315ec9266eedcbe6fb268652

See more details on using hashes here.

File details

Details for the file pdfinsight-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pdfinsight-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a0ae5c5027aa6dbc22c4d9536af4e61dc5da8220483d12d4fa76d2d0b4ea519e
MD5 05183f478b2fa66a631d8916f8d22f80
BLAKE2b-256 b19f6a03cde9626afaf6cb25f7051f855356cf47e5557ae8e7552a001d5476f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page