A package for digitizing text from PDF files.
Project description
PDF Text Digitizer
PDF Text Digitizer is a Python tool to extract text from PDF files. It supports both text-based PDFs and image-based PDFs using OCR.
Features
- Extracts text from text-based PDFs.
- Extracts text from image-based PDFs using OCR.
- Simple API for easy integration.
Installation
Clone the repository and install the dependencies:
git clone https://github.com/yourusername/pdf-text-digitizer.git
cd pdf-text-digitizer
pip install -r requirements.txt
Usage
from pdf_text_digitizer.digitize import digitize_pdf
text = digitize_pdf('path/to/your/pdf_file.pdf')
print(text)
Examples
You can find example usage in the examples directory. To run the example:
python examples/example_usage.py
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf_text_digitizer-0.1.0.tar.gz
.
File metadata
- Download URL: pdf_text_digitizer-0.1.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0a0f6b0f8b6e78287e79718ada2310cb084a3ea6b13da93c2f822d214845788 |
|
MD5 | 286bd51c3c60a5ff0f0f3310edf200c9 |
|
BLAKE2b-256 | f706610b70f6c982abc9fc6ac008981e51a0e51cc45afa3ef994acba80f6b6c8 |
File details
Details for the file pdf_text_digitizer-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pdf_text_digitizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c12494d331deac8718bf73ffaf4bb05dd8ba815b6f6a7819916a486ae227fd7 |
|
MD5 | 72a3ed8642827f0ccb02912adc3eedaf |
|
BLAKE2b-256 | 4ea8fbea156ff195ad4ac55cd85f5531da382fe1eb6b2612d81c8c9d7c8c7560 |