Skip to main content

Comic-Focused Hybrid OCR Python Library

Project description

ComiQ: Comic-Focused Hybrid OCR Library

ComiQ is an advanced Optical Character Recognition (OCR) library specifically designed for comics. It combines traditional OCR engines like EasyOCR and PaddleOCR with Google's Gemini Flash-1.5 model to provide accurate text detection and translation in comic images.

For, observing the capabilities of ComiQ, Visit: examples/ReadME.md

Features

  • Hybrid OCR approach for improved accuracy
  • Specialized in detecting text within comic bubbles and panels
  • Integration with Google's Gemini Flash-1.5 model for enhanced performance
  • Support for multiple OCR engines
  • Easy-to-use Python interface

Installation

Install ComiQ using pip:

pip install comiq

Important Notes:

  • For GPU-accelerated processing, please visit the PyTorch website to install torch and torchvision with CUDA support.
  • ComiQ uses opencv-python-headless as a dependency. If your project requires the full opencv-python package, you may need to manage these dependencies carefully to avoid conflicts. Choose the appropriate version based on your project's needs:
    • For headless environments or when GUI features are not required, ComiQ's default opencv-python-headless is sufficient.
    • If you need GUI features, you may need to uninstall opencv-python-headless and install opencv-python separately.

Quick Start

import comiq

# Set up your Gemini API key
comiq.set_api_key("<GEMINI_API_KEY>")

# Process an image
image_path = "path/to/your/comic/image.jpg"
data = comiq.extract(image_path)

# 'data' now contains a list of bounding boxes for each text bubble in the image

API Reference

set_api_key(api_key: str)

Sets the API key for the ComiQ module, which is required for using the Gemini AI model.

Parameters:

  • api_key (str): The API key for accessing the Gemini AI service.

Usage:

import comiq

comiq.set_api_key("your-api-key-here")

Note:

  • You must call this function and set a valid API key before using any other ComiQ functions.
  • Keep your API key confidential and do not share it publicly.

extract(image: Union[str, 'numpy.ndarray'], ocr: Union[str, List[str]] = "paddleocr")

Extracts text from the given image using specified OCR method(s) and processes it with the Gemini AI model.

Parameters:

  • image (str or numpy.ndarray):
    • If str: Path to the image file.
    • If numpy.ndarray: Numpy array representation of the image.
  • ocr (str or list of str, optional):
    • OCR engine(s) to use. Default is "paddleocr".
    • Possible values: "paddleocr", "easyocr", or a list containing both.

Returns:

  • dict: Processed data containing text extractions and their locations.

Usage:

import comiq

# Using default OCR (PaddleOCR)
result = comiq.extract("path/to/your/comic/image.jpg")

# Using a specific OCR engine
result = comiq.extract("path/to/your/comic/image.jpg", ocr="easyocr")

# Using multiple OCR engines
result = comiq.extract("path/to/your/comic/image.jpg", ocr=["paddleocr", "easyocr"])

# Using a numpy array instead of an image path
import cv2
image_array = cv2.imread("path/to/your/comic/image.jpg")
result = comiq.extract(image_array)

Notes:

  • Ensure you've set the API key using set_api_key() before calling this function.
  • The function automatically preprocesses the image for optimal OCR performance.
  • When using multiple OCR engines, the results are combined for improved accuracy.
  • The returned dictionary contains bounding box coordinates and extracted text for each detected text region in the image.

Advanced Usage

Selecting OCR Engines

ComiQ supports two OCR engines: PaddleOCR and EasyOCR. You can specify which engine(s) to use:

# Use a single OCR engine
data = comiq.extract(image_path, ocr="paddleocr")

# Use multiple OCR engines
data = comiq.extract(image_path, ocr=["paddleocr", "easyocr"])

OCR Engine Comparison

Feature EasyOCR PaddleOCR
Strengths - Detects styled text
- Handles directional text
- Accurate bounding box positioning
- Higher true positive rate
- Better text quality
Weaknesses - Lower text quality
- Higher false positive rate
- Struggles with styled text
- Limited directional text support
- Less accurate positioning

Contributing

We welcome contributions to ComiQ! Please see our Contributing Guide for more information on how to get started.

License

ComiQ is released under the MIT License.

Acknowledgements

Contact

For questions, issues, or suggestions, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comiq-0.0.3.tar.gz (7.7 MB view details)

Uploaded Source

Built Distribution

comiq-0.0.3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file comiq-0.0.3.tar.gz.

File metadata

  • Download URL: comiq-0.0.3.tar.gz
  • Upload date:
  • Size: 7.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for comiq-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ee6dc4321b37df9136535ad467d2b6787e096bb9ba5f8dbd6843668a1c07f771
MD5 e5b1e5a85d0a6096301bf3ca2be9961a
BLAKE2b-256 c433b0eaf7d8dd722e7344ad81d3641b377aeebdf3e06c416aa70beaa916db2a

See more details on using hashes here.

File details

Details for the file comiq-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: comiq-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for comiq-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 71306c00358e1bcbd19a008fc7df93c86c757ae31bb43dbf1873de8da3beaf7f
MD5 4d9829877cb4797356f06337a183a603
BLAKE2b-256 fce6c7cf5fb9320807d6ca007a4717134928f49a76720238e6233e10b23593be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page