Skip to main content

persian_pdf_converter by Mahdi Ramazani is a Python package that converts PDF files to Word documents with OCR support for Persian and English. It automatically downloads and sets up necessary tools like Tesseract and Poppler.

Project description

# persian_pdf_converter

A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.

## Features

- Convert PDF files to Word documents with text recognition
- Modify URLs based on directory paths

## Requirements

- Python 3.6 or higher
- Tesseract OCR installed and configured

## Installation

To install the package, use pip:

```bash
pip install persian-pdf-converter

Usage

Here is an example of how to use the functions provided by this package:

from persian_pdf_converter.pdf_converter import pdf_to_word

# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'

# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")

pdf_to_word Function

This function converts a PDF file to a Word document with text recognition.

Parameters:

  • pdf_path (str): Path to the PDF file.
  • output_dir (str): Directory where the output Word file will be saved.
  • lang (str): Languages to be used by Tesseract for text recognition (default is "fas+eng").
  • Additional keyword arguments for convert_from_path.

Returns:

  • str: Name of the output Word file.

Development

To contribute to this project, follow these steps:

  1. Clone the repository:
    git clone https://github.com/mahdiramezanii/persian_pdf_converter.git
    
  2. Navigate to the project directory:
    cd persian_pdf_converter
    
  3. Create a virtual environment and activate it:
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  4. Install the dependencies:
    pip install -r requirements.txt
    
  5. Make your changes and run tests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to contact me at mahdiramazanii.official@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persian_pdf_converter-2.3.1.tar.gz (93.7 MB view details)

Uploaded Source

Built Distribution

persian_pdf_converter-2.3.1-py3-none-any.whl (94.2 MB view details)

Uploaded Python 3

File details

Details for the file persian_pdf_converter-2.3.1.tar.gz.

File metadata

  • Download URL: persian_pdf_converter-2.3.1.tar.gz
  • Upload date:
  • Size: 93.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for persian_pdf_converter-2.3.1.tar.gz
Algorithm Hash digest
SHA256 7cf58b36322ac6cf720f57a2367751f309c5e9618a45f16a85433cb3407a4143
MD5 a4a42c525390f3eb960f727b53a6c914
BLAKE2b-256 246378a841db0ae43e6f796717fe74cf875352203d96204945c89d3d8a1cc18e

See more details on using hashes here.

File details

Details for the file persian_pdf_converter-2.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for persian_pdf_converter-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 335f74d4eb34c328bfbba782dde21ec699aae7bf3a73c07416b129561acaaaa6
MD5 770fba450ba7362f0f74aade902f722b
BLAKE2b-256 944f8119806c5fe987853e821926f37424df783a2c17dd971dbbbb42aa490f02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page