Skip to main content

图像识别与文字识别模块 - 轻量级独立 OCR/CV 模块

Project description

Supervision - 图像识别与文字识别模块

PyPI version

一个轻量级、高效的图像识别与文字识别模块。完全独立,无需任何设备连接,可轻松集成到任何项目中(桌面端、Web、移动端等)。

核心模块

  • options - 全局配置管理
  • aircv - 图像识别(模板匹配、特征点匹配)
  • orc - 文字识别(OCR)

特性

  • 多种匹配算法 - 支持模板匹配、多尺度模板匹配、KAZE、BRISK、AKAZE、ORB、SIFT、SURF、BRIEF等
  • 完全独立 - 无需设备连接,纯图像/文字识别
  • 易于集成 - 简洁的API接口,支持多种应用场景
  • 高性能 - 优化的匹配策略和算法选择
  • 灵活配置 - 可配置的阈值、超时、算法等参数
  • 支持OCR - 集成PaddleOCR实现中文/英文文字识别

安装

从 PyPI 安装(推荐)

pip install solidvision

从源码安装(开发模式)

# 克隆仓库
git clone https://github.com/SolidQA/solidvision.git
cd solidvision

# 使用 uv 安装(推荐)
uv sync --dev

# 或使用 pip
pip install -e .

快速开始

图像识别

from solidvision import find_location, Template
import cv2

# 读取截图
screenshot = cv2.imread('screenshot.png')

# 方式1: 快速查找
position = find_location(screenshot, 'button.png', threshold=0.8)
print(position)  # (100, 200)

# 方式2: 使用Template
from solidvision.aircv.cv import match_loop
template = Template('button.png')
position = match_loop(lambda: screenshot, template.filepath, threshold=0.8)

文字识别

from solidvision import recognize_text, find_text_position
import cv2

# 读取图像
image = cv2.imread('image.png')

# 识别所有文字
texts = recognize_text(image)
for item in texts:
    print(f"文字: {item['text']}, 位置: {item['position']}")

# 查找特定文字位置
position = find_text_position(image, '确定')
print(position)  # (150, 250)

模块说明

options 模块

全局配置管理:

from solidvision.options import Options, Config

# 设置识别阈值
Options.CV_THRESHOLD = 0.85

# 设置超时时间
Options.FIND_TIMEOUT = 15

# 配置OCR语言
Options.OCR_LANGUAGE = 'ch'  # 中文

# 获取当前配置
config_dict = Options.get_config_dict()

aircv 模块

图像识别功能:

from solidvision.aircv.cv import Template, match_loop, multi_find_location
from solidvision.aircv.settings import Settings

# 配置匹配策略
Settings.CVSTRATEGY = ('tpl', 'kaze', 'brisk')

# 创建模板
template = Template('button.png', threshold=0.8)

# 单次匹配
position = match_loop(screenshot_func, template.filepath, timeout=10, threshold=0.8)

# 查找所有匹配
positions = multi_find_location(screenshot_func, 'button.png', threshold=0.8)

orc 模块

文字识别功能:

from solidvision.orc import TextRecognizer, recognize_text
import cv2

# 创建识别器
recognizer = TextRecognizer(lang='ch')

# 识别图像中的文字
image = cv2.imread('image.png')
results = recognizer.recognize_image(image)

# 查找特定文字
position = recognizer.find_text_position(image, '确定')

# 获取所有文字
text = recognizer.get_page_text(image)

集成示例

集成到桌面应用

import cv2
from solidvision import find_location

def find_button_on_desktop(button_template):
    import pyautogui
    import numpy as np

    # 获取桌面截图
    screenshot = pyautogui.screenshot()
    frame = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2BGR)

    # 识别按钮
    position = find_location(frame, button_template, threshold=0.8)

    if position:
        # 点击按钮
        pyautogui.click(position)
        return True

    return False

# 使用
find_button_on_desktop('button.png')

集成到Web应用 (Flask)

from flask import Flask, request, jsonify
from solidvision import find_location
import cv2
import numpy as np

app = Flask(__name__)

@app.route('/recognize', methods=['POST'])
def recognize():
    file = request.files['image']
    template_path = request.form.get('template')

    # 读取图像
    image_bytes = np.frombuffer(file.read(), np.uint8)
    image = cv2.imdecode(image_bytes, cv2.IMREAD_COLOR)

    # 识别
    position = find_location(image, template_path)

    return jsonify({
        'success': position is not None,
        'position': position
    })

集成到移动应用自动化

from solidvision import find_location, find_text_position
import cv2

class MobileAutomation:
    @staticmethod
    def click(screenshot, template_path):
        """点击UI元素"""
        position = find_location(screenshot, template_path)
        return position

    @staticmethod
    def click_text(screenshot, text):
        """点击文字"""
        position = find_text_position(screenshot, text)
        return position

性能优化

1. 选择合适的算法

from solidvision.aircv.settings import Settings

# 快速匹配(简单场景)
Settings.CVSTRATEGY = ('tpl',)

# 高精度匹配(复杂场景)
Settings.CVSTRATEGY = ('gmstpl', 'sift')

# 平衡方案
Settings.CVSTRATEGY = ('tpl', 'kaze', 'brisk')

2. 调整阈值

# 高阈值:减少误识别,但可能漏检
position = find_location(screenshot, 'template.png', threshold=0.95)

# 低阈值:容易匹配,但可能误识别
position = find_location(screenshot, 'template.png', threshold=0.6)

# 推荐值
position = find_location(screenshot, 'template.png', threshold=0.8)

常见问题

Q: 如何提高识别准确率?

A:

  1. 调整 threshold 参数
  2. 尝试不同的匹配算法
  3. 确保模板清晰,与实际场景相符
  4. 使用高质量的截图

Q: 识别速度太慢?

A:

  1. 使用更快的算法 (tpl 而不是 sift)
  2. 缩小搜索区域
  3. 提高阈值
  4. 使用更小的模板图像

Q: 如何处理旋转或缩放的图像?

A:

from solidvision.aircv.settings import Settings

# 使用多尺度模板匹配
Settings.CVSTRATEGY = ('gmstpl', 'sift')

Q: OCR 识别不准怎么办?

A:

  1. 确保图像质量良好
  2. 调整语言设置 (Options.OCR_LANGUAGE)
  3. 使用高分辨率的图像
  4. 尝试图像预处理(对比度调整等)

项目结构

solidvision/
├── options/              # 配置管理模块
│   └── __init__.py
├── aircv/               # 图像识别模块
│   ├── cv.py           # 核心匹配接口
│   ├── template_matching.py
│   ├── keypoint_matching.py
│   └── ...
├── orc/                # 文字识别模块
│   └── __init__.py
└── utils/              # 工具函数

许可证

Apache License 2.0

反馈

致谢

本项目基于以下开源项目:


立即开始使用 Supervision! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solidvision-1.0.0.tar.gz (782.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

solidvision-1.0.0-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file solidvision-1.0.0.tar.gz.

File metadata

  • Download URL: solidvision-1.0.0.tar.gz
  • Upload date:
  • Size: 782.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for solidvision-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2c2d7da45c912d67efae21910d7cbb3c0b98789408b9ce4df776a8256dae3040
MD5 2447593aef8f56034fcb9d88ec385229
BLAKE2b-256 66f1b85ee9a3f974d3fd2f6e9a3a29b913b04f9dfc42adddbc2890b33a699757

See more details on using hashes here.

Provenance

The following attestation bundles were made for solidvision-1.0.0.tar.gz:

Publisher: publish.yml on SolidQA/solidvision

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file solidvision-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: solidvision-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for solidvision-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 700d4c1bfae287db0db6d3b5294e14aeb8d5d46d40515f5bbaa062e6dbe09a04
MD5 bc28a58072cf11b2ef31543a631a9b95
BLAKE2b-256 b3923873ccbc7635b62822b2d4596786dc7a2f7173020919537df76c86cccfaf

See more details on using hashes here.

Provenance

The following attestation bundles were made for solidvision-1.0.0-py3-none-any.whl:

Publisher: publish.yml on SolidQA/solidvision

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page