Skip to main content

A decoupled OCR helper library using PaddleOCR (via remote server) and SQLite caching.

Project description

Vibe-OCR

An intelligent, decoupled OCR helper library designed for automation tasks. It leverages a remote PaddleOCR server for text recognition and includes a robust local caching system using SQLite to optimize performance and reduce repeated network requests.

Installation

pip install vibe-ocr

Features

  • Decoupled Architecture: Uses a remote OCR server (PaddleOCR) to offload heavy computation.
  • Smart Caching: Local SQLite database caches OCR results for identical images/regions, significantly speeding up repeated checks.
  • Declarative API: High-level GameActions API for chaining operations (filter, map, click).
  • Snapshot Integration: Built-in support for taking screenshots (via airtest by default) or custom snapshot functions.
  • Retry Logic: Automatic retry without cache if text is not found initially.

Usage Guide

1. Initialization

Initialize the OCRHelper.

from vibe_ocr import OCRHelper

# Basic initialization
ocr = OCRHelper(output_dir="output")

2. High-Level Declarative API (Recommended)

The GameActions class provides a powerful, fluent interface for finding and interacting with game elements. This is the preferred way to write automation scripts.

from vibe_ocr import OCRHelper, GameActions

ocr = OCRHelper(output_dir="output")
actions = GameActions(ocr)

# Find all texts, filter for "Item", and click the first one
actions.find_all() \
       .contains("Item") \
       .min_confidence(0.8) \
       .first() \
       .click()

# Find specific text with timeout (retries automatically)
actions.find("Start Game", timeout=5).click()

# Check if text exists
if actions.text_exists("Game Over"):
    print("Game ended")

# Batch operations
actions.find_all() \
       .filter(lambda e: "Coin" in e.text) \
       .click_all()

3. Low-Level API: Finding Text

You can also use the OCRHelper directly for simple tasks.

# Capture screen and find text "Login"
result = ocr.capture_and_find_text(
    "Login",
    confidence_threshold=0.7,
    occurrence=1,   # 1st occurrence
    use_cache=True  # Use cache if screen hasn't changed
)

if result and result.get("found"):
    print(f"Found 'Login' at: {result['center']}")
else:
    print("Text not found.")

4. Low-Level API: Finding and Clicking

A convenience method to find text and simulate a touch/click action (requires airtest installed).

# Find "Confirm" and click it if found
clicked = ocr.find_and_click_text(
    "Confirm",
    confidence_threshold=0.6
)

Configuration

1. PaddleX OCR Server (Required)

This library requires a running PaddleOCR server (PaddleX 3.0+). You can easily deploy it using Docker:

docker run -d --name paddlex \
  --shm-size=8g \
  --network=host \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.3.11-paddlepaddle3.2.0-cpu \
  sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"
  • Port: The server listens on 8080 by default.
  • Endpoint: http://localhost:8080/ocr

2. Environment Variables

  • OCR_SERVER_URL: The URL of the PaddleOCR server. Defaults to http://localhost:8080/ocr.

Dependencies

  • Airtest (Optional but Recommended): The click() methods and default snapshot function rely on airtest. Ensure it is installed (pip install airtest) if you plan to use these features.

Constructor Parameters

  • output_dir: Directory to store cache (sqlite db) and debug images.
  • snapshot_func: Callable to take screenshots. Defaults to airtest.core.api.snapshot.
  • delete_temp_screenshots: Whether to delete temporary screenshot files after processing (Default: True).
  • resize_image: Resize large images before sending to OCR server to improve speed (Default: True).

Caching Mechanism

vibe-ocr calculates a perceptual hash (dhash) of the screenshot. If a similar image exists in the sqlite cache, it retrieves the OCR result locally instead of calling the server. This is critical for high-frequency loops in automation scripts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_ocr-0.1.4.tar.gz (126.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_ocr-0.1.4-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file vibe_ocr-0.1.4.tar.gz.

File metadata

  • Download URL: vibe_ocr-0.1.4.tar.gz
  • Upload date:
  • Size: 126.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.4.tar.gz
Algorithm Hash digest
SHA256 785e44335a434455931f4b0c91b4ce9e33e3a8f4c369d2325dae7e46096b6587
MD5 11e38c4f1b69cc1c6b4f524ac98297b1
BLAKE2b-256 ac978bb7559e41411d960a82d3256ce2856c4c64f3208658c06b3318398754ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.4.tar.gz:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vibe_ocr-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: vibe_ocr-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fd5c4fe7eaded36540fe3592759186265de30646119bf6c580f08439f4d3e501
MD5 3927156b102d34f1bfeb08aea924c692
BLAKE2b-256 e2e93f9750c9d0ecd98e40ae9bce7e67b212470e241e589166b8592d27ebc201

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.4-py3-none-any.whl:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page