Skip to main content

A decoupled OCR helper library using PaddleOCR (via remote server) and SQLite caching.

Project description

Vibe-OCR

An intelligent, decoupled OCR helper library designed for automation tasks. It leverages a remote PaddleOCR server for text recognition and includes a robust local caching system using SQLite to optimize performance and reduce repeated network requests.

Installation

pip install vibe-ocr

Features

  • Decoupled Architecture: Uses a remote OCR server (PaddleOCR) to offload heavy computation.
  • Smart Caching: Local SQLite database caches OCR results for identical images/regions, significantly speeding up repeated checks.
  • Declarative API: High-level GameActions API for chaining operations (filter, map, click).
  • Snapshot Integration: Built-in support for taking screenshots (via airtest by default) or custom snapshot functions.
  • Retry Logic: Automatic retry without cache if text is not found initially.

Usage Guide

1. Initialization

Initialize the OCRHelper.

from vibe_ocr import OCRHelper

# Basic initialization
ocr = OCRHelper(output_dir="output")

2. High-Level Declarative API (Recommended)

The GameActions class provides a powerful, fluent interface for finding and interacting with game elements. This is the preferred way to write automation scripts.

from vibe_ocr import OCRHelper, GameActions

ocr = OCRHelper(output_dir="output")
actions = GameActions(ocr)

# Find all texts, filter for "Item", and click the first one
actions.find_all() \
       .contains("Item") \
       .min_confidence(0.8) \
       .first() \
       .click()

# Find specific text with timeout (retries automatically)
actions.find("Start Game", timeout=5).click()

# Check if text exists
if actions.text_exists("Game Over"):
    print("Game ended")

# Batch operations
actions.find_all() \
       .filter(lambda e: "Coin" in e.text) \
       .click_all()

3. Low-Level API: Finding Text

You can also use the OCRHelper directly for simple tasks.

# Capture screen and find text "Login"
result = ocr.capture_and_find_text(
    "Login",
    confidence_threshold=0.7,
    occurrence=1,   # 1st occurrence
    use_cache=True  # Use cache if screen hasn't changed
)

if result and result.get("found"):
    print(f"Found 'Login' at: {result['center']}")
else:
    print("Text not found.")

4. Low-Level API: Finding and Clicking

A convenience method to find text and simulate a touch/click action (requires airtest installed).

# Find "Confirm" and click it if found
clicked = ocr.find_and_click_text(
    "Confirm",
    confidence_threshold=0.6
)

Configuration

1. PaddleX OCR Server (Required)

This library requires a running PaddleOCR server (PaddleX 3.0+). You can deploy it using Docker.

Option A: CPU Version (Lightweight)

docker run -d --name paddlex \
  --shm-size=8g \
  --network=host \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.3.11-paddlepaddle3.2.0-cpu \
  sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"

Option B: GPU Version (High Performance)

If you have an NVIDIA GPU, this provides significantly lower latency.

docker run -d --name paddlex \
  --gpus all \
  --shm-size=8g \
  --network=host \
  ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.1-paddlepaddle3.0.0-gpu-cuda11.8-cudnn8.9-trt8.6 \
  sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"

Note on Mobile Models: To use the faster "Mobile" models (recommended for real-time automation), append the following command to switch the config before serving: ... sh -lc "paddlex --install serving && paddlex --get_pipeline_config OCR --save_path . && sed -i 's/_server_/_mobile_/g' OCR.yaml && paddlex --serve --pipeline OCR.yaml"

  • Port: The server listens on 8080 by default.
  • Endpoint: http://localhost:8080/ocr

2. Environment Variables

  • OCR_SERVER_URL: The URL of the PaddleOCR server. Defaults to http://localhost:8080/ocr.

Dependencies

  • Airtest (Optional but Recommended): The click() methods and default snapshot function rely on airtest. Ensure it is installed (pip install airtest) if you plan to use these features.

Constructor Parameters

  • output_dir: Directory to store cache (sqlite db) and debug images.
  • snapshot_func: Callable to take screenshots. Defaults to airtest.core.api.snapshot.
  • delete_temp_screenshots: Whether to delete temporary screenshot files after processing (Default: True).
  • resize_image: Resize large images before sending to OCR server to improve speed (Default: True).

Caching Mechanism

vibe-ocr calculates a perceptual hash (dhash) of the screenshot. If a similar image exists in the sqlite cache, it retrieves the OCR result locally instead of calling the server. This is critical for high-frequency loops in automation scripts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_ocr-0.1.5.tar.gz (126.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_ocr-0.1.5-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file vibe_ocr-0.1.5.tar.gz.

File metadata

  • Download URL: vibe_ocr-0.1.5.tar.gz
  • Upload date:
  • Size: 126.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.5.tar.gz
Algorithm Hash digest
SHA256 158bf2dc10f7e26dd702f9793400a2953cb89b87190dea637ea85ac7cbf70422
MD5 2835f7e45a5ed5536198b95eabefcef3
BLAKE2b-256 9da01ae2aa45df325bb9bad7677373c797af351437d2b0a8afd4e3639c059809

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.5.tar.gz:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vibe_ocr-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: vibe_ocr-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8b5db2635692fd679df5e79ac10b6237e9c28d1b4fbdaf848f15f86a17f2071e
MD5 d6548d473d0cee86d32fdc9f548cbb34
BLAKE2b-256 d0946f39561789289bb5d95693dc685fd2bc851f197637a214472b00b04eab3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.5-py3-none-any.whl:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page