Skip to main content

Text Mining & Classification Toolkit

Project description

PDFInsight

Text Mining & Classification Toolkit

Extract and categorise text-based PDFs into the following categories

  • table of contents
  • header
  • heading
  • tables
  • content
  • footnote
  • footer
  • page number
  • unsure (text that cannot be categorised)

Example

import pdfinsight
df = pdfinsight.pdf_extractor("sample.pdf")

Installation

pip install PDFInsight

References

https://github.com/pymupdf/PyMuPDF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfinsight-0.0.4.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfinsight-0.0.4-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pdfinsight-0.0.4.tar.gz.

File metadata

  • Download URL: pdfinsight-0.0.4.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.4.tar.gz
Algorithm Hash digest
SHA256 669e80f2a1ff18b38fee229b15a6852233a1a0ff9d604ca7721985fe8ecf6417
MD5 467626a88480f3cce6454a388ed2eb18
BLAKE2b-256 4d18488627b9add4b415d03a065b290b361d02f5b42a432f52b3aef24bce43dd

See more details on using hashes here.

File details

Details for the file pdfinsight-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pdfinsight-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for pdfinsight-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 320f7101249d31aa047c3f733f21af8980789b93d84cf8d9f63961f2bd852e35
MD5 61a7a10a1dd47ad11860df03597d9085
BLAKE2b-256 21c7ee1145f484ab939db1af3051653f6cdb2a7111d47b89fc83881bc2e12320

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page