No project description provided
Project description
🔤 Optical Character Recognition (OCR) Filter
OCR Filter is a modular OpenFilter-based filter for extracting text from video frames using EasyOCR or Tesseract.
It supports frame-by-frame OCR, optional skipping via metadata, and flexible deployment as part of OpenFilter pipelines.
✨ Features
- 🧾 Extracts text using EasyOCR or Tesseract from frames
- 🔍 Supports per-frame metadata control (e.g. skip OCR)
- ⚙️ Configurable via CLI args, code, or environment variables
- 🧩 Plug-and-play compatibility with OpenFilter
- 📤 Outputs recognized text, ocr confidence score and bounding boxes as metadata
📦 Installation
Install the latest version from PyPI:
pip install filter-optical-character-recognition
Or install from source:
# Clone the repo
git clone https://github.com/PlainsightAI/filter-optical-character-recognition.git
cd filter-optical-character-recognition
# (Optional but recommended) create a virtual environemnt:
python -m venv venv && source venv/bin/activate
# Install the filter
make install
💡 The
make installtarget installsopenfilter[all], ensuring dependencies likeVideoInandWebviswork out of the box.
🚀 Quick Start (CLI)
Run the OCR Filter using the OpenFilter CLI:
# Most basic version, no annotion or result logging
openfilter run \
- VideoIn --sources 'file://video_example.mp4!loop' \
- filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
- Webvis
# Log results into stdout
openfilter run \
- VideoIn --sources 'file://video_example.mp4!loop' \
- filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
--mq_log pretty
- Webvis
# Annotation enabled, overlaying detected text out output
openfilter run \
- VideoIn --sources 'file://video_example.mp4!loop' \
- filter_optical_character_recognition.filter.FilterOpticalCharacterRecognition \
--ocr_engine easyocr \
--forward_ocr_texts true \
--draw_visualization true \
--visualization_topic "main" \
--topic_pattern "main" \
- Webvis
Or simply:
make run
Then open http://localhost:8000 to view the output.
📄 See the
.env.examplefile for environment variable options.
🧰 Using from PyPI
After installing with:
pip install filter-optical-character-recognition
you can use the OCR Filter directly in code:
Example usage
from openfilter.filter_runtime.filter import Filter
from openfilter.filter_runtime.filters.video_in import VideoIn
from openfilter.filter_runtime.filters.webvis import Webvis
from filter_optical_character_recognition.filter import FilterOpticalCharacterRecognition
if __name__ == "__main__":
Filter.run_multi([
(VideoIn, dict(
sources='file://video_example.mp4!loop',
outputs='tcp://*:5550'
)),
(FilterOpticalCharacterRecognition, dict(
sources='tcp://localhost:5550',
outputs='tcp://*:5552',
draw_visualization=True,
visualization_topic="main"
)),
(Webvis, dict(
sources='tcp://localhost:5552'
)),
])
🧪 Testing
Run tests locally:
make test
Or run a specific test file:
pytest -v tests/test_filter_ocr.py
Tests cover:
- OCR accuracy and bounding box parsing
skip_ocrhandling- Frame metadata propagation
- Integration in multi-filter pipelines
🔧 Special Features
Metadata-Based Skipping
You can skip OCR on specific frames by setting this field:
"meta": {
"skip_ocr": true
}
This allows selective processing and performance tuning.
🔩 Requirements
The OCR Filter depends on the following tools:
easyocrpytesseract- Tesseract OCR binary (AppImage or system install)
Ensure the tesseract binary is available in your environment when running OCR with Tesseract.
🤝 Contributing
We welcome contributions! Please read our CONTRIBUTING.md for instructions.
Highlights:
- Format code with
black - Lint with
ruff - Use type hints on public methods
- Sign commits using DCO (
git commit -s) - Include tests when relevant
📄 License
Licensed under the Apache 2.0 License.
🙏 Acknowledgements
Thanks for using the OCR Filter! For questions or feature requests, open a GitHub issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filter_optical_character_recognition-0.1.5-py3-none-any.whl.
File metadata
- Download URL: filter_optical_character_recognition-0.1.5-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70e20c06e2601c4a977edd868205e464a03503af3e752c80151d0cfc8986b979
|
|
| MD5 |
4d54e73062b63d9b35a3cf48086f99af
|
|
| BLAKE2b-256 |
9359fcc530d20943bf07a929e5d40bafca0d47b2ac9acb6e9cf0dd4fbd889d17
|