A decoupled OCR helper library using PaddleOCR (via remote server) and SQLite caching.
Project description
Vibe-OCR
An intelligent, decoupled OCR helper library designed for automation tasks. It leverages a remote PaddleOCR server for text recognition and includes a robust local caching system using SQLite to optimize performance and reduce repeated network requests.
Installation
pip install vibe-ocr
Features
- Decoupled Architecture: Uses a remote OCR server (PaddleOCR) to offload heavy computation.
- Smart Caching: Local SQLite database caches OCR results for identical images/regions, significantly speeding up repeated checks.
- Declarative API: High-level
GameActionsAPI for chaining operations (filter, map, click). - Snapshot Integration: Built-in support for taking screenshots (via
airtestby default) or custom snapshot functions. - Retry Logic: Automatic retry without cache if text is not found initially.
Usage Guide
1. Initialization
Initialize the OCRHelper.
from vibe_ocr import OCRHelper
# Basic initialization
ocr = OCRHelper(output_dir="output")
2. High-Level Declarative API (Recommended)
The GameActions class provides a powerful, fluent interface for finding and interacting with game elements. This is the preferred way to write automation scripts.
from vibe_ocr import OCRHelper, GameActions
ocr = OCRHelper(output_dir="output")
actions = GameActions(ocr)
# Find all texts, filter for "Item", and click the first one
actions.find_all() \
.contains("Item") \
.min_confidence(0.8) \
.first() \
.click()
# Find specific text with timeout (retries automatically)
actions.find("Start Game", timeout=5).click()
# Check if text exists
if actions.text_exists("Game Over"):
print("Game ended")
# Batch operations
actions.find_all() \
.filter(lambda e: "Coin" in e.text) \
.click_all()
3. Low-Level API: Finding Text
You can also use the OCRHelper directly for simple tasks.
# Capture screen and find text "Login"
result = ocr.capture_and_find_text(
"Login",
confidence_threshold=0.7,
occurrence=1, # 1st occurrence
use_cache=True # Use cache if screen hasn't changed
)
if result and result.get("found"):
print(f"Found 'Login' at: {result['center']}")
else:
print("Text not found.")
4. Low-Level API: Finding and Clicking
A convenience method to find text and simulate a touch/click action (requires airtest installed).
# Find "Confirm" and click it if found
clicked = ocr.find_and_click_text(
"Confirm",
confidence_threshold=0.6
)
Configuration
1. PaddleX OCR Server (Required)
This library requires a running PaddleOCR server (PaddleX 3.0+). You can easily deploy it using Docker:
docker run -d --name paddlex \
--shm-size=8g \
--network=host \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.3.11-paddlepaddle3.2.0-cpu \
sh -lc "paddlex --install serving && paddlex --serve --pipeline OCR"
- Port: The server listens on
8080by default. - Endpoint:
http://localhost:8080/ocr
2. Environment Variables
OCR_SERVER_URL: The URL of the PaddleOCR server. Defaults tohttp://localhost:8080/ocr.
Dependencies
- Airtest (Optional but Recommended): The
click()methods and default snapshot function rely onairtest. Ensure it is installed (pip install airtest) if you plan to use these features.
Constructor Parameters
output_dir: Directory to store cache (sqlite db) and debug images.snapshot_func: Callable to take screenshots. Defaults toairtest.core.api.snapshot.delete_temp_screenshots: Whether to delete temporary screenshot files after processing (Default:True).resize_image: Resize large images before sending to OCR server to improve speed (Default:True).
Caching Mechanism
vibe-ocr calculates a perceptual hash (dhash) of the screenshot. If a similar image exists in the sqlite cache, it retrieves the OCR result locally instead of calling the server. This is critical for high-frequency loops in automation scripts.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vibe_ocr-0.1.4.tar.gz.
File metadata
- Download URL: vibe_ocr-0.1.4.tar.gz
- Upload date:
- Size: 126.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
785e44335a434455931f4b0c91b4ce9e33e3a8f4c369d2325dae7e46096b6587
|
|
| MD5 |
11e38c4f1b69cc1c6b4f524ac98297b1
|
|
| BLAKE2b-256 |
ac978bb7559e41411d960a82d3256ce2856c4c64f3208658c06b3318398754ae
|
Provenance
The following attestation bundles were made for vibe_ocr-0.1.4.tar.gz:
Publisher:
pypi-vibe-ocr.yaml on jasoft/pythonlib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vibe_ocr-0.1.4.tar.gz -
Subject digest:
785e44335a434455931f4b0c91b4ce9e33e3a8f4c369d2325dae7e46096b6587 - Sigstore transparency entry: 836094231
- Sigstore integration time:
-
Permalink:
jasoft/pythonlib@b6068b98d0ef97896db31bdba2b939a2619a5722 -
Branch / Tag:
refs/tags/vibe-ocr-v0.1.4 - Owner: https://github.com/jasoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-vibe-ocr.yaml@b6068b98d0ef97896db31bdba2b939a2619a5722 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vibe_ocr-0.1.4-py3-none-any.whl.
File metadata
- Download URL: vibe_ocr-0.1.4-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd5c4fe7eaded36540fe3592759186265de30646119bf6c580f08439f4d3e501
|
|
| MD5 |
3927156b102d34f1bfeb08aea924c692
|
|
| BLAKE2b-256 |
e2e93f9750c9d0ecd98e40ae9bce7e67b212470e241e589166b8592d27ebc201
|
Provenance
The following attestation bundles were made for vibe_ocr-0.1.4-py3-none-any.whl:
Publisher:
pypi-vibe-ocr.yaml on jasoft/pythonlib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vibe_ocr-0.1.4-py3-none-any.whl -
Subject digest:
fd5c4fe7eaded36540fe3592759186265de30646119bf6c580f08439f4d3e501 - Sigstore transparency entry: 836094232
- Sigstore integration time:
-
Permalink:
jasoft/pythonlib@b6068b98d0ef97896db31bdba2b939a2619a5722 -
Branch / Tag:
refs/tags/vibe-ocr-v0.1.4 - Owner: https://github.com/jasoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-vibe-ocr.yaml@b6068b98d0ef97896db31bdba2b939a2619a5722 -
Trigger Event:
push
-
Statement type: