Skip to main content

A package for digitizing text from PDF files.

Project description

PDF Text Digitizer

PDF Text Digitizer is a Python tool to extract text from PDF files. It supports both text-based PDFs and image-based PDFs using OCR.

Features

  • Extracts text from text-based PDFs.
  • Extracts text from image-based PDFs using OCR.
  • Simple API for easy integration.

Installation

Clone the repository and install the dependencies:

git clone https://github.com/yourusername/pdf-text-digitizer.git
cd pdf-text-digitizer
pip install -r requirements.txt

Usage

from pdf_text_digitizer.digitize import digitize_pdf

text = digitize_pdf('path/to/your/pdf_file.pdf')
print(text)

Examples

You can find example usage in the examples directory. To run the example:

python examples/example_usage.py

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_text_digitizer-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

pdf_text_digitizer-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf_text_digitizer-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_text_digitizer-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for pdf_text_digitizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c0a0f6b0f8b6e78287e79718ada2310cb084a3ea6b13da93c2f822d214845788
MD5 286bd51c3c60a5ff0f0f3310edf200c9
BLAKE2b-256 f706610b70f6c982abc9fc6ac008981e51a0e51cc45afa3ef994acba80f6b6c8

See more details on using hashes here.

File details

Details for the file pdf_text_digitizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_text_digitizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c12494d331deac8718bf73ffaf4bb05dd8ba815b6f6a7819916a486ae227fd7
MD5 72a3ed8642827f0ccb02912adc3eedaf
BLAKE2b-256 4ea8fbea156ff195ad4ac55cd85f5531da382fe1eb6b2612d81c8c9d7c8c7560

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page