VLMS - Video Intelligence SDK with event-based processing
Project description
VLMS - Video Intelligence SDK
Event-based video intelligence with 98% cost reduction
Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.
Note:
pip install vlm-sdkinstalls the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.
๐ Features
Core SDK (vlm)
- ๐ฏ Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
- ๐น Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
- ๐ค RT-DETR + ByteTrack: Real-time object detection and motion tracking
- ๐ง Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
- ๐จ Advanced analysis: Timestamps, object detection, bounding boxes, range queries
Production API (api)
- โก FastAPI REST API: Industry-standard multi-stream video intelligence
- ๐ก Server-Sent Events (SSE): Real-time event streaming
- ๐ Authentication: API key-based auth with rate limiting
- ๐ Monitoring: Health checks, metrics, stream management
- ๐ง Configurable: Environment-based provider selection
๐ Quick Start
Installation
# Install from PyPI
pip install vlm-sdk
# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .
SDK Usage
from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio
# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor(
confidence_threshold=0.6,
track_objects=["person", "car"],
min_duration=2.0 # Only events longer than 2 seconds
)
gemini = GeminiVideoService(api_key="your-gemini-key")
# Process stream
async def process():
for frame in connector.stream_frames():
result = preprocessor.process_frame(frame.data, frame.timestamp)
if result['status'] == 'completed':
# Event detected! Analyze with VLM
upload = await gemini.upload_file(result['clip_path'])
analysis = await gemini.query_video_with_file(
upload['name'],
"Describe the activity in this video"
)
print(f"Analysis: {analysis['response']}")
asyncio.run(process())
API Server
# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini # or openai, anthropic
# Install SDK (from repo checkout)
pip install -e .
# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt
# Run server
python -m api.main
# Server starts at http://localhost:8000
Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided
mediamtx.yml(see docs/apiguide.md for commands).
Docker Image
# Pull the public image (linux/amd64)
docker pull observee/vlm-sdk:latest
# Run the API (set your API keys as needed)
docker run --rm -p 8000:8000 \
-e ADMIN_API_KEY=your-secret-key \
-e GEMINI_API_KEY=your-gemini-key \
observee/vlm-sdk:latest
Create a stream:
curl -X POST http://localhost:8000/v1/streams/create \
-H "X-Admin-API-Key: your-secret-key" \
-H "X-VLM-API-Key: your-gemini-key" \
-H "Content-Type: application/json" \
-d '{
"source_type": "rtsp",
"source_url": "rtsp://camera.local/stream1",
"config": {
"username": "admin",
"password": "password",
"profile": "security",
"min_duration": 2.0
},
"analysis": {
"enabled": true,
"mode": "basic",
"prompt": "Describe any activity or movement"
}
}'
Listen to events (SSE):
curl -N http://localhost:8000/v1/streams/{stream_id}/events \
-H "X-Admin-API-Key: your-secret-key"
๐ Documentation
Environment Variables
# Required
ADMIN_API_KEY=your-admin-key # API authentication
# VLM Provider (choose one)
VLM_PROVIDER=gemini # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key # If using Gemini
OPENAI_API_KEY=your-openai-key # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key # If using Claude
# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100 # Requests per window
RATE_LIMIT_WINDOW=60 # Time window (seconds)
Analysis Modes
Basic - Simple video description
{
"analysis": {
"mode": "basic",
"prompt": "Describe the activity"
}
}
Timestamps - Find specific moments
{
"analysis": {
"mode": "timestamps",
"find_timestamps": {
"query": "when does someone wave",
"find_all": true,
"confidence_threshold": 0.7
}
}
}
Supported Connectors
| Connector | Description | Config |
|---|---|---|
| RTSP | IP camera streams | username, password, transport (tcp/udp) |
| ONVIF | Auto-discovery + PTZ | username, password, profile_index |
| UDP | UDP video receiver | host, port, buffer_size |
| WebRTC | Browser streams | signaling_url, ice_servers |
API Endpoints
POST /v1/streams/create Create stream
GET /v1/streams/{id}/events SSE event stream
GET /v1/streams/{id} Get status
DELETE /v1/streams/{id} Stop stream
GET /v1/streams List all streams
GET /v1/streams/discover/onvif Discover cameras
GET /v1/streams/health Health check
๐๏ธ Architecture
โโโโโโโโโโโโโโโ
โ Connector โ (RTSP/ONVIF/UDP/WebRTC)
โโโโโโโโฌโโโโโโโ
โ Frames
โผ
โโโโโโโโโโโโโโโ
โ RT-DETR โ (Object detection + motion tracking)
โโโโโโโโฌโโโโโโโ
โ Events (only motion/activity)
โผ
โโโโโโโโโโโโโโโ
โ Event Bufferโ (Collects frames during events)
โโโโโโโโฌโโโโโโโ
โ Complete Events
โโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโ โโโโโโโโโโโโ
โ Storage โ โ VLM โ (Gemini/Qwen/ObserveeVLM)
โโโโโโโโโโโโโ โโโโโโฌโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โ SSE / Webhooksโ
โโโโโโโโโโโโโโโโโ
Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.
๐ฆ Repository Layout
vlm-sdk/
โโโ vlm/ # Core SDK components
โโโ api/ # FastAPI service (routers, services, models)
โโโ examples/ # Sample scripts for RTSP/UDP/WebRTC usage
โโโ docs/ # Additional documentation
โโโ mediamtx/ # MediaMTX config for WebRTC/RTSP bridging
โโโ output/ # Example generated clips (safe to remove)
โโโ pyproject.toml # SDK packaging metadata
โโโ requirements.txt # Full dependency list for API/Docker
โโโ Dockerfile # Reference container for the API
โโโ README.md
๐ง Development
# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
# Install with dev dependencies
pip install -e ".[dev]"
# Include API stack if you plan to run the server locally
pip install -r requirements.txt
# Run tests
pytest tests/
# Format code
black vlm/ api/
ruff check vlm/ api/
# Run API server (development)
uvicorn api.main:app --reload
๐ฏ Use Cases
- ๐ข Security & Surveillance: 24/7 perimeter monitoring with motion alerts
- ๐ช Retail Analytics: Customer counting, queue analysis, behavior tracking
- ๐ Traffic Monitoring: Vehicle counting, flow analysis, incident detection
- ๐ Smart Home: Activity monitoring, intrusion detection
- ๐ญ Industrial: Safety compliance, equipment monitoring
๐ Cost Comparison
| Approach | Frames/Hour | VLM API Calls | Cost Reduction |
|---|---|---|---|
| Frame-by-frame | 54,000 (15 FPS) | 54,000 | Baseline |
| Event-based (VLMS) | 54,000 | ~1,000 | 98% โ |
Example: 1-hour 15 FPS stream with 5-10 motion events
๐ค Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
Apache-2.0 โ Permissive license suitable for commercial and open-source use.
See LICENSE for the complete text. Commercial support is available on request.
๐ Acknowledgments
- Ultralytics RT-DETR: Object detection and tracking
- FastAPI: Modern Python web framework
- Google Gemini: Video understanding API
- Qwen API: Alternative Video Understanding API
- ByteTrack: Multi-object tracking algorithm
Built with โค๏ธ for efficient video intelligence in SF
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vlm_sdk-0.0.3.tar.gz.
File metadata
- Download URL: vlm_sdk-0.0.3.tar.gz
- Upload date:
- Size: 49.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
361d36f160f48e061f97f4a7f425ae3b749720180790836de7130c1346d08455
|
|
| MD5 |
31654f93a1233b10cf19c3fe15e6f659
|
|
| BLAKE2b-256 |
8802ebf2350ee4b080e1575de934e758590e5a0c1ec614e2e87156cd770afa6d
|
File details
Details for the file vlm_sdk-0.0.3-py3-none-any.whl.
File metadata
- Download URL: vlm_sdk-0.0.3-py3-none-any.whl
- Upload date:
- Size: 55.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1440fbc78e0a3265184ca516078c09bc23b38b9afba80d8eda82c0496adaf0cd
|
|
| MD5 |
b6054274cb756efbd83a833c8faf3c0c
|
|
| BLAKE2b-256 |
07b22b6942b86bf9ff44b25e7e0861fef6378bc9ef77866ce52f59d453c1e4e8
|