persian_pdf_converter by Mahdi Ramazani is a Python package that converts PDF files to Word documents with OCR support for Persian and English. It automatically downloads and sets up necessary tools like Tesseract and Poppler.
Project description
# persian_pdf_converter
A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.
## Features
- Convert PDF files to Word documents with text recognition
- Modify URLs based on directory paths
## Requirements
- Python 3.6 or higher
- Tesseract OCR installed and configured
## Installation
To install the package, use pip:
```bash
pip install persian-pdf-converter
Usage
Here is an example of how to use the functions provided by this package:
from persian_pdf_converter.pdf_converter import pdf_to_word
# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'
# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")
pdf_to_word Function
This function converts a PDF file to a Word document with text recognition.
Parameters:
pdf_path
(str): Path to the PDF file.output_dir
(str): Directory where the output Word file will be saved.lang
(str): Languages to be used by Tesseract for text recognition (default is"fas+eng"
).- Additional keyword arguments for
convert_from_path
.
Returns:
str
: Name of the output Word file.
Development
To contribute to this project, follow these steps:
- Clone the repository:
git clone https://github.com/mahdiramezanii/persian_pdf_converter.git
- Navigate to the project directory:
cd persian_pdf_converter
- Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the dependencies:
pip install -r requirements.txt
- Make your changes and run tests.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contact
If you have any questions or suggestions, feel free to contact me at mahdiramazanii.official@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file persian_pdf_converter-2.3.1.tar.gz
.
File metadata
- Download URL: persian_pdf_converter-2.3.1.tar.gz
- Upload date:
- Size: 93.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cf58b36322ac6cf720f57a2367751f309c5e9618a45f16a85433cb3407a4143 |
|
MD5 | a4a42c525390f3eb960f727b53a6c914 |
|
BLAKE2b-256 | 246378a841db0ae43e6f796717fe74cf875352203d96204945c89d3d8a1cc18e |
File details
Details for the file persian_pdf_converter-2.3.1-py3-none-any.whl
.
File metadata
- Download URL: persian_pdf_converter-2.3.1-py3-none-any.whl
- Upload date:
- Size: 94.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 335f74d4eb34c328bfbba782dde21ec699aae7bf3a73c07416b129561acaaaa6 |
|
MD5 | 770fba450ba7362f0f74aade902f722b |
|
BLAKE2b-256 | 944f8119806c5fe987853e821926f37424df783a2c17dd971dbbbb42aa490f02 |