Read text from images

Project description

Simply-ocr 1.0.1

By Jeroben Guzmán

For text extraction from images on a simple way / Para extraer texto de imagenes de forma sencilla

💡 Prerequisites

Python 3

🚀 Features

Preprocesamiento avanzado: binarización, eliminación de ruido, ajuste de contraste, redimensionado, selección de región (ROI).
Soporte multilenguaje y detección de idiomas instalados.
Extracción de texto estructurado (texto y bounding boxes).
Guardado de texto extraído a archivo.
Visualización de la imagen preprocesada para debugging.

📦 Instalación de dependencias

Asegúrate de tener instalados:

opencv-python
scikit-image
pytesseract
matplotlib

Puedes instalar todo con:

pip install opencv-python scikit-image pytesseract matplotlib

📚 Ejemplos

from simply_ocr import (
    read_image_en, read_image_es, get_available_languages,
    save_text_to_file, show_preprocessed_image
)

# Extraer texto en inglés o español
read_image_en('test.jpg')
read_image_es('test.jpg')

# Extraer texto de una región específica y mostrar la imagen preprocesada
roi = (100, 200, 300, 100)  # x, y, w, h
texto = read_image_en('test.jpg', preprocess_opts={'roi': roi, 'binarize': True, 'remove_noise': True})
show_preprocessed_image('test.jpg', preprocess_opts={'roi': roi})

# Guardar el texto extraído en un archivo
if texto:
    save_text_to_file(texto, 'salida.txt')

# Consultar los idiomas disponibles en tu instalación de Tesseract
print(get_available_languages())

🧩 Casos de uso

1. Digitalización de documentos escaneados

Extrae texto de facturas, recibos, contratos o cualquier documento escaneado para su almacenamiento o análisis automatizado.

texto = read_image_es('factura.png')
print(texto)

2. Procesamiento de imágenes de cámaras o móviles

Ideal para extraer texto de fotos tomadas con el móvil, por ejemplo, carteles, pizarras o notas manuscritas.

texto = read_image_es('foto_pizarra.jpg', preprocess_opts={'binarize': True, 'remove_noise': True})

3. OCR en regiones específicas (ROI)

Extrae texto solo de una parte de la imagen, útil para formularios o layouts fijos.

roi = (50, 100, 200, 50)  # x, y, w, h
texto = read_image_es('formulario.png', preprocess_opts={'roi': roi})

4. Automatización de flujos de trabajo

Guarda automáticamente el texto extraído para su posterior procesamiento o integración con otros sistemas.

texto = read_image_es('ticket.jpg')
if texto:
    save_text_to_file(texto, 'ticket.txt')

5. Visualización y ajuste de preprocesamiento

Ajusta parámetros y visualiza el resultado para mejorar la precisión del OCR.

show_preprocessed_image('documento.jpg', preprocess_opts={'contrast': 1.5, 'binarize': True})

📝 Notas

Puedes personalizar el preprocesamiento usando el parámetro preprocess_opts en las funciones.
Para usar la visualización, asegúrate de tener matplotlib instalado.
El OCR funciona mejor con imágenes nítidas y bien contrastadas.

Project details

Release history Release notifications | RSS feed

This version

1.0.1

Jun 11, 2025

1.0.0

Jun 11, 2025

0.0.1

Jan 5, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simply_ocr-1.0.1.tar.gz (4.6 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simply_ocr-1.0.1-py3-none-any.whl (4.9 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file simply_ocr-1.0.1.tar.gz.

File metadata

Download URL: simply_ocr-1.0.1.tar.gz
Upload date: Jun 11, 2025
Size: 4.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for simply_ocr-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e6649363c4769e777f4e8221c0a1518363587f1592e61cc204c7df43240b1e82`
MD5	`eccb021908108f6ce63fcaf80adb8031`
BLAKE2b-256	`a4917b8a4a1b9002bc12d405b9661c305acad2ac9b453178ad1fac1064ac7078`

See more details on using hashes here.

File details

Details for the file simply_ocr-1.0.1-py3-none-any.whl.

File metadata

Download URL: simply_ocr-1.0.1-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 4.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for simply_ocr-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1ac08a5b7d8adad577000e003be751b542d85eb6a04e11d80707b075b7ac2145`
MD5	`b2c9fc35b4a82e50d5c2fdc97d0b36d1`
BLAKE2b-256	`974235b317d71d3066a4a9da5b5a486429206e5256f7fbe9fe21e7c23374caf7`

See more details on using hashes here.

simply-ocr 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Simply-ocr 1.0.1

💡 Prerequisites

🚀 Features

📦 Instalación de dependencias

📚 Ejemplos

🧩 Casos de uso

1. Digitalización de documentos escaneados

2. Procesamiento de imágenes de cámaras o móviles

3. OCR en regiones específicas (ROI)

4. Automatización de flujos de trabajo

5. Visualización y ajuste de preprocesamiento

📝 Notas

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes