Skip to main content

A decoupled OCR helper library using PaddleOCR (via remote server) and SQLite caching.

Project description

Vibe-OCR

An intelligent, decoupled OCR helper library designed for automation tasks. It leverages a remote PaddleOCR server for text recognition and includes a robust local caching system using SQLite to optimize performance and reduce repeated network requests.

Installation

pip install vibe-ocr

Features

  • Decoupled Architecture: Uses a remote OCR server (PaddleOCR) to offload heavy computation.
  • Smart Caching: Local SQLite database caches OCR results for identical images/regions, significantly speeding up repeated checks.
  • Declarative API: High-level GameActions API for chaining operations (filter, map, click).
  • Snapshot Integration: Built-in support for taking screenshots (via airtest by default) or custom snapshot functions.
  • Retry Logic: Automatic retry without cache if text is not found initially.

Usage Guide

1. Initialization

Initialize the OCRHelper.

from vibe_ocr import OCRHelper

# Basic initialization
ocr = OCRHelper(output_dir="output")

2. High-Level Declarative API (Recommended)

The GameActions class provides a powerful, fluent interface for finding and interacting with game elements. This is the preferred way to write automation scripts.

from vibe_ocr import OCRHelper, GameActions

ocr = OCRHelper(output_dir="output")
actions = GameActions(ocr)

# Find all texts, filter for "Item", and click the first one
actions.find_all() \
       .contains("Item") \
       .min_confidence(0.8) \
       .first() \
       .click()

# Find specific text with timeout (retries automatically)
actions.find("Start Game", timeout=5).click()

# Check if text exists
if actions.text_exists("Game Over"):
    print("Game ended")

# Batch operations
actions.find_all() \
       .filter(lambda e: "Coin" in e.text) \
       .click_all()

3. Low-Level API: Finding Text

You can also use the OCRHelper directly for simple tasks.

# Capture screen and find text "Login"
result = ocr.capture_and_find_text(
    "Login",
    confidence_threshold=0.7,
    occurrence=1,   # 1st occurrence
    use_cache=True  # Use cache if screen hasn't changed
)

if result and result.get("found"):
    print(f"Found 'Login' at: {result['center']}")
else:
    print("Text not found.")

4. Low-Level API: Finding and Clicking

A convenience method to find text and simulate a touch/click action (requires airtest installed).

# Find "Confirm" and click it if found
clicked = ocr.find_and_click_text(
    "Confirm",
    confidence_threshold=0.6
)

Configuration

Environment Variables

  • OCR_SERVER_URL: The URL of the PaddleOCR server. Defaults to http://localhost:8080/ocr.

Dependencies

  • Airtest (Optional but Recommended): The click() methods and default snapshot function rely on airtest. Ensure it is installed (pip install airtest) if you plan to use these features.

Constructor Parameters

  • output_dir: Directory to store cache (sqlite db) and debug images.
  • snapshot_func: Callable to take screenshots. Defaults to airtest.core.api.snapshot.
  • delete_temp_screenshots: Whether to delete temporary screenshot files after processing (Default: True).
  • resize_image: Resize large images before sending to OCR server to improve speed (Default: True).

Caching Mechanism

vibe-ocr calculates a perceptual hash (dhash) of the screenshot. If a similar image exists in the sqlite cache, it retrieves the OCR result locally instead of calling the server. This is critical for high-frequency loops in automation scripts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibe_ocr-0.1.3.tar.gz (126.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibe_ocr-0.1.3-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file vibe_ocr-0.1.3.tar.gz.

File metadata

  • Download URL: vibe_ocr-0.1.3.tar.gz
  • Upload date:
  • Size: 126.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.3.tar.gz
Algorithm Hash digest
SHA256 95eb28239d6a3419bdefdcd26e0ab04c6220b76ad3ba3d9e5350c87b1b0ca007
MD5 e5275c3066bd61003285c7a980173895
BLAKE2b-256 a6ab69e55f5c15b22bbf87a47423137347fa886cebf23d85e79d7567fedd8c19

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.3.tar.gz:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vibe_ocr-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: vibe_ocr-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vibe_ocr-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4ca05da423cc971eaebcdd590d5f981ef14eb56aa717684c8ed59072d3cbee63
MD5 9071022fd2771ac7af582a367242acc0
BLAKE2b-256 dfedd94d36d50ab152039a3b8182a93d1d939e3a0ad308d7f19e6711095efabb

See more details on using hashes here.

Provenance

The following attestation bundles were made for vibe_ocr-0.1.3-py3-none-any.whl:

Publisher: pypi-vibe-ocr.yaml on jasoft/pythonlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page