Skip to main content

convert persian pdf to .docx

Project description

# persian_pdf_converter

A Python package for converting PDF files to Word documents and modifying URLs. This package utilizes Tesseract OCR for text recognition in PDF files.

## Features

- Convert PDF files to Word documents with text recognition
- Modify URLs based on directory paths

## Requirements

- Python 3.6 or higher
- Tesseract OCR installed and configured

## Installation

To install the package, use pip:

```bash
pip install persian-pdf-converter

Usage

Here is an example of how to use the functions provided by this package:

from persian_pdf_converter.pdf_converter import pdf_to_word

# Path to your PDF file and output directory
pdf_path = 'path/to/example.pdf'
output_dir = 'path/to/output/dir'

# Convert PDF to Word
output_file = pdf_to_word(pdf_path, output_dir, lang="fas+eng", dpi=300)
print(f"Converted file saved as: {output_file}")

pdf_to_word Function

This function converts a PDF file to a Word document with text recognition.

Parameters:

  • pdf_path (str): Path to the PDF file.
  • output_dir (str): Directory where the output Word file will be saved.
  • lang (str): Languages to be used by Tesseract for text recognition (default is "fas+eng").
  • Additional keyword arguments for convert_from_path.

Returns:

  • str: Name of the output Word file.

Development

To contribute to this project, follow these steps:

  1. Clone the repository:
    git clone https://github.com/mahdiramezanii/persian_pdf_converter.git
    
  2. Navigate to the project directory:
    cd persian_pdf_converter
    
  3. Create a virtual environment and activate it:
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    
  4. Install the dependencies:
    pip install -r requirements.txt
    
  5. Make your changes and run tests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to contact me at mahdiramazanii.official@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persian_pdf_converter-2.2.1.tar.gz (93.7 MB view details)

Uploaded Source

Built Distribution

persian_pdf_converter-2.2.1-py3-none-any.whl (94.2 MB view details)

Uploaded Python 3

File details

Details for the file persian_pdf_converter-2.2.1.tar.gz.

File metadata

  • Download URL: persian_pdf_converter-2.2.1.tar.gz
  • Upload date:
  • Size: 93.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for persian_pdf_converter-2.2.1.tar.gz
Algorithm Hash digest
SHA256 212bb334e8e229cb4ffd6cac5fd0385a3d9094af4fc44e4015e4a89ce26a82dd
MD5 8a5333adac898fc193497890a858e2c5
BLAKE2b-256 00f347ba642cd36a2a52ca85dfa9f0d5096c44fe4004e318fbac63045c1aa05a

See more details on using hashes here.

File details

Details for the file persian_pdf_converter-2.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for persian_pdf_converter-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2bb72fefac7cbfb37a497cd78773553349b49181a3a03116388886884fd87c51
MD5 8cc27e615ea3a866d58d55c256f79a7e
BLAKE2b-256 761d8357516fb6cdd71729c23836102c3e3b9c4446ed5ce976c379107ba526c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page