No project description provided

These details have not been verified by PyPI

Project description

🔤 TextScan

TextScan is a modular OpenFilter-based filter for extracting text from video frames using EasyOCR or Tesseract.

It supports frame-by-frame OCR, optional skipping via metadata, and flexible deployment as part of OpenFilter pipelines.

Build Status

✨ Features

🧾 Extracts text using EasyOCR or Tesseract from frames
🔍 Supports per-frame metadata control (e.g. skip OCR)
⚙️ Configurable via CLI args, code, or environment variables
🧩 Plug-and-play compatibility with OpenFilter
📤 Outputs recognized text, ocr confidence score and bounding boxes as metadata
🔄 Multi-topic processing - processes multiple video regions simultaneously
🔀 Data forwarding - forwards non-image frames when enabled
📊 Main-first ordering - ensures consistent output structure

📦 Installation

Install the latest version from PyPI:

pip install filter-optical-character-recognition

Or install from source:

# Clone the repo
git clone https://github.com/PlainsightAI/filter-optical-character-recognition.git
cd filter-optical-character-recognition

# (Optional but recommended) create a virtual environemnt:
python -m venv venv && source venv/bin/activate

# Install the filter
make install

💡 The make install target installs openfilter[all], ensuring dependencies like VideoIn and Webvis work out of the box.

🚀 Quick Start (CLI)

Run the OCR Filter using the OpenFilter CLI:

# Most basic version, no annotion or result logging
openfilter run \
  - VideoIn --sources 'file://video_example.mp4!loop' \
  - filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
  - Webvis

# Log results into stdout
openfilter run \
  - VideoIn --sources 'file://video_example.mp4!loop' \
  - filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
    --mq_log pretty
  - Webvis

# Multi-topic processing with region-based OCR
openfilter run \
  - VideoIn --sources 'file://video_example.mp4!loop' \
  - filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
      --ocr_engine easyocr \
      --forward_ocr_texts true \
      --draw_visualization true \
      --topic_pattern "region_.*" \
      --exclude_topics "main" \
      --forward_upstream_data true \
  - Webvis

Or simply:

make run

Then open http://localhost:8000 to view the output.

📄 See the .env.example file for environment variable options.

🧰 Using from PyPI

After installing with:

pip install filter-optical-character-recognition

you can use the OCR Filter directly in code:

Example usage

from openfilter.filter_runtime.filter import Filter
from openfilter.filter_runtime.filters.video_in import VideoIn
from openfilter.filter_runtime.filters.webvis import Webvis
from filter_optical_character_recognition.filter import FilterOpticalCharacterRecognition

if __name__ == "__main__":
    Filter.run_multi([
      (VideoIn, dict(
          sources='file://video_example.mp4!loop',
          outputs='tcp://*:5550'
      )),
      (FilterOpticalCharacterRecognition, dict(
          sources='tcp://localhost:5550',
          outputs='tcp://*:5552',
          draw_visualization=True,
          visualization_topic="main"
      )),
      (Webvis, dict(
          sources='tcp://localhost:5552'
      )),
    ])

🧪 Testing

Run tests locally:

make test

Or run a specific test file:

pytest -v tests/test_filter_ocr.py

Tests cover:

OCR accuracy and bounding box parsing
skip_ocr handling
Frame metadata propagation
Integration in multi-filter pipelines
Multi-topic processing and main-first ordering
Configuration normalization and validation
Data forwarding behavior

🔧 Special Features

Metadata-Based Skipping

You can skip OCR on specific frames by setting this field:

"meta": {
  "skip_ocr": true
}

This allows selective processing and performance tuning.

🔩 Requirements

The OCR Filter depends on the following tools:

easyocr
pytesseract
Tesseract OCR binary (AppImage or system install)

Ensure the tesseract binary is available in your environment when running OCR with Tesseract.

🤝 Contributing

We welcome contributions! Please read our CONTRIBUTING.md for instructions.

Highlights:

Format code with black
Lint with ruff
Use type hints on public methods
Sign commits using DCO (git commit -s)
Include tests when relevant

📄 License

Licensed under the Apache 2.0 License.

🙏 Acknowledgements

Thanks for using TextScan! For questions or feature requests, open a GitHub issue.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.10

Apr 24, 2026

0.1.5

Aug 12, 2025

0.1.4

Aug 6, 2025

0.1.3

Aug 1, 2025

0.1.2

Jul 14, 2025

0.1.1

May 21, 2025

0.1.0

May 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

filter_optical_character_recognition-0.1.10-py3-none-any.whl (15.2 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file filter_optical_character_recognition-0.1.10-py3-none-any.whl.

File metadata

Download URL: filter_optical_character_recognition-0.1.10-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for filter_optical_character_recognition-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e3a1a10e465f86dd7a3c69e9f36ba0fdce889366b5c1a86b05f64c5b987f956b`
MD5	`32d36a68150680566f63db61f1c60314`
BLAKE2b-256	`a2e3ff7e1533a56a6c1826ebd61329885b8001c1882ef726afe2ad51760a4b19`

See more details on using hashes here.

filter-optical-character-recognition 0.1.10

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers