A library to read text from images
Project description
Image Text Reader
The image-text-reader library allows you to extract text from images using Optical Character Recognition (OCR) with the help of the pytesseract library and Pillow for image processing.
Table of Contents
Prerequisites
- Python 3.x
- Tesseract-OCR
Installation
-
Install the required Python libraries:
pip install image-text-reader
-
Install Tesseract-OCR:
-
Windows: Download and install from here.
-
macOS: Use Homebrew to install:
brew install tesseract
-
Linux: Use your package manager, for example:
sudo apt-get install tesseract-ocr
-
Usage
-
Create a Python script (e.g.,
test_script.py) and import theocr_imagefunction from theimage_text_readerlibrary:from image_text_reader import ocr_image
-
Set the path to your image and Tesseract-OCR executable:
# Update these paths for your system image_path = 'C:/path_to_your_image.jpg' # Replace with the path to your test image tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe' # Path to Tesseract executable extracted_text = ocr_image(image_path, tesseract_cmd=tesseract_cmd) print("Extracted Text:") print(extracted_text)
-
Run your script:
python test_script.py
Code Explanation
-
Preprocessing Function:
The
preprocess_imagefunction prepares the image for OCR by converting it to grayscale, sharpening it, and enhancing its contrast:def preprocess_image(image_path): image = Image.open(image_path).convert('L') image = image.filter(ImageFilter.SHARPEN) enhancer = ImageEnhance.Contrast(image) image = enhancer.enhance(2) return image
-
OCR Function:
The
ocr_imagefunction processes the image and then extracts the text usingpytesseract:def ocr_image(image_path, tesseract_cmd=None): if tesseract_cmd: pytesseract.pytesseract.tesseract_cmd = tesseract_cmd image = preprocess_image(image_path) text = pytesseract.image_to_string(image, lang='eng') return text
Contributing
Contributions are welcome! Please open an issue or submit a pull request for any changes.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
For more information, visit the image-text-reader library page on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file image_text_reader-1.0.0.tar.gz.
File metadata
- Download URL: image_text_reader-1.0.0.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45e4a55f73737dd66330f3a92a03af788b3c52793e604ad265200d929d9d06f7
|
|
| MD5 |
d2094c9664803068de332f18325acc98
|
|
| BLAKE2b-256 |
06e8f20b2617c273586bc009fb83ffe760c4240e50e06d66cffb45ca3cc01a58
|
File details
Details for the file image_text_reader-1.0.0-py3-none-any.whl.
File metadata
- Download URL: image_text_reader-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95f7edbd567f79d28fabff26a9dc95f0752c37fb860a7132d7dcfb300eb80cc4
|
|
| MD5 |
fe8eff287e92a729ed97a908c814bc25
|
|
| BLAKE2b-256 |
d03a6178fe7da36a819489c1f9a871bb661982c055d0a0a43adc8d7e830f466c
|