图像识别与文字识别模块 - 轻量级独立 OCR/CV 模块

These details have not been verified by PyPI

Project links

Project description

Supervision - 图像识别与文字识别模块

一个轻量级、高效的图像识别与文字识别模块。完全独立，无需任何设备连接，可轻松集成到任何项目中（桌面端、Web、移动端等）。

核心模块

options - 全局配置管理
aircv - 图像识别（模板匹配、特征点匹配）
orc - 文字识别（OCR）

特性

✅ 多种匹配算法 - 支持模板匹配、多尺度模板匹配、KAZE、BRISK、AKAZE、ORB、SIFT、SURF、BRIEF等
✅ 完全独立 - 无需设备连接，纯图像/文字识别
✅ 易于集成 - 简洁的API接口，支持多种应用场景
✅ 高性能 - 优化的匹配策略和算法选择
✅ 灵活配置 - 可配置的阈值、超时、算法等参数
✅ 支持OCR - 集成PaddleOCR实现中文/英文文字识别

安装

从 PyPI 安装（推荐）

pip install solidvision

从源码安装（开发模式）

# 克隆仓库
git clone https://github.com/SolidQA/solidvision.git
cd solidvision

# 使用 uv 安装（推荐）
uv sync --dev

# 或使用 pip
pip install -e .

快速开始

图像识别

from solidvision import find_location, Template
import cv2

# 读取截图
screenshot = cv2.imread('screenshot.png')

# 方式1: 快速查找
position = find_location(screenshot, 'button.png', threshold=0.8)
print(position)  # (100, 200)

# 方式2: 使用Template
from solidvision.aircv.cv import match_loop
template = Template('button.png')
position = match_loop(lambda: screenshot, template.filepath, threshold=0.8)

文字识别

from solidvision import recognize_text, find_text_position
import cv2

# 读取图像
image = cv2.imread('image.png')

# 识别所有文字
texts = recognize_text(image)
for item in texts:
    print(f"文字: {item['text']}, 位置: {item['position']}")

# 查找特定文字位置
position = find_text_position(image, '确定')
print(position)  # (150, 250)

模块说明

options 模块

全局配置管理：

from solidvision.options import Options, Config

# 设置识别阈值
Options.CV_THRESHOLD = 0.85

# 设置超时时间
Options.FIND_TIMEOUT = 15

# 配置OCR语言
Options.OCR_LANGUAGE = 'ch'  # 中文

# 获取当前配置
config_dict = Options.get_config_dict()

aircv 模块

图像识别功能：

from solidvision.aircv.cv import Template, match_loop, multi_find_location
from solidvision.aircv.settings import Settings

# 配置匹配策略
Settings.CVSTRATEGY = ('tpl', 'kaze', 'brisk')

# 创建模板
template = Template('button.png', threshold=0.8)

# 单次匹配
position = match_loop(screenshot_func, template.filepath, timeout=10, threshold=0.8)

# 查找所有匹配
positions = multi_find_location(screenshot_func, 'button.png', threshold=0.8)

orc 模块

文字识别功能：

from solidvision.orc import TextRecognizer, recognize_text
import cv2

# 创建识别器
recognizer = TextRecognizer(lang='ch')

# 识别图像中的文字
image = cv2.imread('image.png')
results = recognizer.recognize_image(image)

# 查找特定文字
position = recognizer.find_text_position(image, '确定')

# 获取所有文字
text = recognizer.get_page_text(image)

集成示例

集成到桌面应用

import cv2
from solidvision import find_location

def find_button_on_desktop(button_template):
    import pyautogui
    import numpy as np

    # 获取桌面截图
    screenshot = pyautogui.screenshot()
    frame = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2BGR)

    # 识别按钮
    position = find_location(frame, button_template, threshold=0.8)

    if position:
        # 点击按钮
        pyautogui.click(position)
        return True

    return False

# 使用
find_button_on_desktop('button.png')

集成到Web应用 (Flask)

from flask import Flask, request, jsonify
from solidvision import find_location
import cv2
import numpy as np

app = Flask(__name__)

@app.route('/recognize', methods=['POST'])
def recognize():
    file = request.files['image']
    template_path = request.form.get('template')

    # 读取图像
    image_bytes = np.frombuffer(file.read(), np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR)

    # 识别
    position = find_location(image, template_path)

    return jsonify({
        'success': position is not None,
        'position': position
    })

集成到移动应用自动化

from solidvision import find_location, find_text_position
import cv2

class MobileAutomation:
    @staticmethod
    def click(screenshot, template_path):
        """点击UI元素"""
        position = find_location(screenshot, template_path)
        return position

    @staticmethod
    def click_text(screenshot, text):
        """点击文字"""
        position = find_text_position(screenshot, text)
        return position

性能优化

1. 选择合适的算法

from solidvision.aircv.settings import Settings

# 快速匹配（简单场景）
Settings.CVSTRATEGY = ('tpl',)

# 高精度匹配（复杂场景）
Settings.CVSTRATEGY = ('gmstpl', 'sift')

# 平衡方案
Settings.CVSTRATEGY = ('tpl', 'kaze', 'brisk')

2. 调整阈值

# 高阈值：减少误识别，但可能漏检
position = find_location(screenshot, 'template.png', threshold=0.95)

# 低阈值：容易匹配，但可能误识别
position = find_location(screenshot, 'template.png', threshold=0.6)

# 推荐值
position = find_location(screenshot, 'template.png', threshold=0.8)

常见问题

Q: 如何提高识别准确率？

调整 threshold 参数
尝试不同的匹配算法
确保模板清晰，与实际场景相符
使用高质量的截图

Q: 识别速度太慢？

使用更快的算法 (tpl 而不是 sift)
缩小搜索区域
提高阈值
使用更小的模板图像

Q: 如何处理旋转或缩放的图像？

from solidvision.aircv.settings import Settings

# 使用多尺度模板匹配
Settings.CVSTRATEGY = ('gmstpl', 'sift')

Q: OCR 识别不准怎么办？

确保图像质量良好
调整语言设置 (Options.OCR_LANGUAGE)
使用高分辨率的图像
尝试图像预处理（对比度调整等）

项目结构

solidvision/
├── options/              # 配置管理模块
│   └── __init__.py
├── aircv/               # 图像识别模块
│   ├── cv.py           # 核心匹配接口
│   ├── template_matching.py
│   ├── keypoint_matching.py
│   └── ...
├── orc/                # 文字识别模块
│   └── __init__.py
└── utils/              # 工具函数

许可证

Apache License 2.0

反馈

问题反馈：提交 Issue
建议反馈：提交 Discussion
Email: caishilong@exuils.com

致谢

本项目基于以下开源项目：

立即开始使用 Supervision！ 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jan 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solidvision-1.0.0.tar.gz (782.0 kB view details)

Uploaded Jan 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

solidvision-1.0.0-py3-none-any.whl (44.5 kB view details)

Uploaded Jan 26, 2026 Python 3

File details

Details for the file solidvision-1.0.0.tar.gz.

File metadata

Download URL: solidvision-1.0.0.tar.gz
Upload date: Jan 26, 2026
Size: 782.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for solidvision-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2c2d7da45c912d67efae21910d7cbb3c0b98789408b9ce4df776a8256dae3040`
MD5	`2447593aef8f56034fcb9d88ec385229`
BLAKE2b-256	`66f1b85ee9a3f974d3fd2f6e9a3a29b913b04f9dfc42adddbc2890b33a699757`

See more details on using hashes here.

Provenance

The following attestation bundles were made for solidvision-1.0.0.tar.gz:

Publisher: publish.yml on SolidQA/solidvision

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: solidvision-1.0.0.tar.gz
- Subject digest: 2c2d7da45c912d67efae21910d7cbb3c0b98789408b9ce4df776a8256dae3040
- Sigstore transparency entry: 855245188
- Sigstore integration time: Jan 26, 2026
Source repository:
- Permalink: SolidQA/solidvision@bcb52517b70f810674688f5215044b4d438ea44e
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/SolidQA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bcb52517b70f810674688f5215044b4d438ea44e
- Trigger Event: push

File details

Details for the file solidvision-1.0.0-py3-none-any.whl.

File metadata

Download URL: solidvision-1.0.0-py3-none-any.whl
Upload date: Jan 26, 2026
Size: 44.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for solidvision-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`700d4c1bfae287db0db6d3b5294e14aeb8d5d46d40515f5bbaa062e6dbe09a04`
MD5	`bc28a58072cf11b2ef31543a631a9b95`
BLAKE2b-256	`b3923873ccbc7635b62822b2d4596786dc7a2f7173020919537df76c86cccfaf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for solidvision-1.0.0-py3-none-any.whl:

Publisher: publish.yml on SolidQA/solidvision

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: solidvision-1.0.0-py3-none-any.whl
- Subject digest: 700d4c1bfae287db0db6d3b5294e14aeb8d5d46d40515f5bbaa062e6dbe09a04
- Sigstore transparency entry: 855245190
- Sigstore integration time: Jan 26, 2026
Source repository:
- Permalink: SolidQA/solidvision@bcb52517b70f810674688f5215044b4d438ea44e
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/SolidQA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bcb52517b70f810674688f5215044b4d438ea44e
- Trigger Event: push

solidvision 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Supervision - 图像识别与文字识别模块

核心模块

特性

安装

从 PyPI 安装（推荐）

从源码安装（开发模式）

快速开始

图像识别

文字识别

模块说明

options 模块

aircv 模块

orc 模块

集成示例

集成到桌面应用

集成到Web应用 (Flask)

集成到移动应用自动化

性能优化

1. 选择合适的算法

2. 调整阈值

常见问题

Q: 如何提高识别准确率？

Q: 识别速度太慢？

Q: 如何处理旋转或缩放的图像？

Q: OCR 识别不准怎么办？

项目结构

许可证

反馈

致谢

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance