Skip to main content

Text Mining & Classification Toolkit

Project description

PDFInsight

Text Mining & Classification Toolkit

Extract and categorise text-based PDFs into the following categories

  • table of contents
  • header
  • heading
  • tables
  • content
  • footnote
  • footer
  • page number
  • unsure (text that cannot be categorised)

Example

import pdfinsight
df = pdfinsight.pdf_extractor("sample.pdf")

Installation

pip install PDFInsight

References

https://github.com/pymupdf/PyMuPDF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PDFInsight-0.0.1.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PDFInsight-0.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file PDFInsight-0.0.1.tar.gz.

File metadata

  • Download URL: PDFInsight-0.0.1.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for PDFInsight-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f81750f7a232d652b4681c938934e02c96c0405b4a074e5dacd18b3268ca62c0
MD5 5e20ac4af0810b35d837b2613e3f3132
BLAKE2b-256 96bd491b7c8f4a872d2fbb5b3cd5da06f0eecea24a7fe20b769ca2b40d6ef2b6

See more details on using hashes here.

File details

Details for the file PDFInsight-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: PDFInsight-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for PDFInsight-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3b6bf2eb22abe3c24b7402d6d1690c15f44f65522951da4f6cb08f1cb25b326
MD5 ead007985ad122dd862539c32c64eb8a
BLAKE2b-256 f3a1f4fa2745aae82514e43760160a51d496193d9ade70744e9babb44cf5569f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page