Skip to main content

A decoupled OCR helper library using PaddleOCR (via remote server) and SQLite caching.

Project description

Vibe-OCR

An intelligent, decoupled OCR helper library designed for automation tasks. It leverages a remote PaddleOCR server for text recognition and includes a robust local caching system using SQLite to optimize performance and reduce repeated network requests.

Installation

pip install vibe-ocr

Features

  • Decoupled Architecture: Uses a remote OCR server (PaddleOCR) to offload heavy computation.
  • Smart Caching: Local SQLite database caches OCR results for identical images/regions, significantly speeding up repeated checks.
  • Declarative API: High-level GameActions API for chaining operations (filter, map, click).
  • Snapshot Integration: Built-in support for taking screenshots (via airtest by default) or custom snapshot functions.
  • Retry Logic: Automatic retry without cache if text is not found initially.

Usage Guide

1. Initialization

Initialize the OCRHelper.

from vibe_ocr import OCRHelper

# Basic initialization
ocr = OCRHelper(output_dir="output")

2. High-Level Declarative API (Recommended)

The GameActions class provides a powerful, fluent interface for finding and interacting with game elements. This is the preferred way to write automation scripts.

from vibe_ocr import OCRHelper, GameActions

ocr = OCRHelper(output_dir="output")
actions = GameActions(ocr)

# Find all texts, filter for "Item", and click the first one
actions.find_all() \
       .contains("Item") \
       .min_confidence(0.8) \
       .first() \
       .click()

# Find specific text with timeout (retries automatically)
actions.find("Start Game", timeout=5).click()

# Check if text exists
if actions.text_exists("Game Over"):
    print("Game ended")

# Batch operations
actions.find_all() \
       .filter(lambda e: "Coin" in e.text) \
       .click_all()

3. Low-Level API: Finding Text

You can also use the OCRHelper directly for simple tasks.

# Capture screen and find text "Login"
result = ocr.capture_and_find_text(
    "Login",
    confidence_threshold=0.7,
    occurrence=1,   # 1st occurrence
    use_cache=True  # Use cache if screen hasn't changed
)

if result and result.get("found"):
    print(f"Found 'Login' at: {result['center']}")
else:
    print("Text not found.")

4. Low-Level API: Finding and Clicking

A convenience method to find text and simulate a touch/click action (requires airtest installed).

# Find "Confirm" and click it if found
clicked = ocr.find_and_click_text(
    "Confirm",
    confidence_threshold=0.6
)

Configuration

1. PaddleX OCR Server (Required)

This library requires a running PaddleOCR server (PaddleX 3.0+). You can deploy it using Docker.

Option A: CPU Version (Lightweight)

docker run -d --name paddlex \
  --shm-size=8g \
  --network=host \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.3.11-paddlepaddle3.2.0-cpu \
  sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"

Option B: GPU Version (High Performance)

If you have an NVIDIA GPU, this provides significantly lower latency.

docker run -d --name paddlex \
  --gpus all \
  --shm-size=8g \
  --network=host \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.1-paddlepaddle3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 \
  sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"

Note on Mobile Models: To use the faster "Mobile" models (recommended for real-time automation), append the following command to switch the config before serving: ... sh -lc "paddlex --install serving && paddlex --get_pipeline_config OCR --save_path . && sed -i 's/_server_/_mobile_/g' OCR.yaml && paddlex --serve --pipeline OCR.yaml"

  • Port: The server listens on 8080 by default.
  • Endpoint: http://localhost:8080/ocr

2. Environment Variables

  • OCR_SERVER_URL: The URL of the PaddleOCR server. Defaults to http://localhost:8080/ocr.

Dependencies

  • Airtest (Optional but Recommended): The click() methods and default snapshot function rely on airtest. Ensure it is installed (pip install airtest) if you plan to use these features.

Constructor Parameters

  • output_dir: Directory to store cache (sqlite db) and debug images.
  • snapshot_func: Callable to take screenshots. Defaults to airtest.core.api.snapshot.
  • delete_temp_screenshots: Whether to delete temporary screenshot files after processing (Default: True).
  • resize_image: Resize large images before sending to OCR server to improve speed (Default: True).

Caching Mechanism

vibe-ocr calculates a perceptual hash (dhash) of the screenshot. If a similar image exists in the sqlite cache, it retrieves the OCR result locally instead of calling the server. This is critical for high-frequency loops in automation scripts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_ocr-0.1.6.tar.gz (126.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_ocr-0.1.6-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file vibe_ocr-0.1.6.tar.gz.

File metadata

  • Download URL: vibe_ocr-0.1.6.tar.gz
  • Upload date:
  • Size: 126.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.6.tar.gz
Algorithm Hash digest
SHA256 3079ef2c644728f5d21354ce5ac02cef5a4cdb863323028684b87526c58cd494
MD5 5b3c96b4cfd14ef14d3921ef258d1e6f
BLAKE2b-256 11b99e088e3b403065ba3febe18d7a507e01aab7f4a462fb5d890d25dfa4cf14

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.6.tar.gz:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vibe_ocr-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: vibe_ocr-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 03be5ed4eb29463d046d4a482241a6b75119be724d36c191840f37033e56a4a6
MD5 aa4f8cec60c492260d2e350a524482d8
BLAKE2b-256 d3184ee57c4334b10cf27febde750babb2af97d0a8075744662347a3a16a1b28

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.6-py3-none-any.whl:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page