Skip to main content

VLMS - Video Intelligence SDK with event-based processing

Project description

VLMS - Video Intelligence SDK

Event-based video intelligence with 98% cost reduction

Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.

Note: pip install vlm-sdk installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.

Python 3.10+ License: Apache-2.0


๐ŸŒŸ Features

Core SDK (vlm)

  • ๐ŸŽฏ Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
  • ๐Ÿ“น Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
  • ๐Ÿค– RT-DETR + ByteTrack: Real-time object detection and motion tracking
  • ๐Ÿง  Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
  • ๐ŸŽจ Advanced analysis: Timestamps, object detection, bounding boxes, range queries

Production API (api)

  • โšก FastAPI REST API: Industry-standard multi-stream video intelligence
  • ๐Ÿ“ก Server-Sent Events (SSE): Real-time event streaming
  • ๐Ÿ” Authentication: API key-based auth with rate limiting
  • ๐Ÿ“Š Monitoring: Health checks, metrics, stream management
  • ๐Ÿ”ง Configurable: Environment-based provider selection

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install vlm-sdk

# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .

SDK Usage

from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio

# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor(
    confidence_threshold=0.6,
    track_objects=["person", "car"],
    min_duration=2.0  # Only events longer than 2 seconds
)

gemini = GeminiVideoService(api_key="your-gemini-key")

# Process stream
async def process():
    for frame in connector.stream_frames():
        result = preprocessor.process_frame(frame.data, frame.timestamp)

        if result['status'] == 'completed':
            # Event detected! Analyze with VLM
            upload = await gemini.upload_file(result['clip_path'])
            analysis = await gemini.query_video_with_file(
                upload['name'],
                "Describe the activity in this video"
            )
            print(f"Analysis: {analysis['response']}")

asyncio.run(process())

API Server

# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini  # or openai, anthropic

# Install SDK (from repo checkout)
pip install -e .

# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt

# Run server
python -m api.main

# Server starts at http://localhost:8000

Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided mediamtx.yml (see docs/apiguide.md for commands).

Docker Image

# Pull the public image (linux/amd64)
docker pull observee/vlm-sdk:latest

# Run the API (set your API keys as needed)
docker run --rm -p 8000:8000 \
  -e ADMIN_API_KEY=your-secret-key \
  -e GEMINI_API_KEY=your-gemini-key \
  observee/vlm-sdk:latest

Create a stream:

curl -X POST http://localhost:8000/v1/streams/create \
  -H "X-Admin-API-Key: your-secret-key" \
  -H "X-VLM-API-Key: your-gemini-key" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "rtsp",
    "source_url": "rtsp://camera.local/stream1",
    "config": {
      "username": "admin",
      "password": "password",
      "profile": "security",
      "min_duration": 2.0
    },
    "analysis": {
      "enabled": true,
      "mode": "basic",
      "prompt": "Describe any activity or movement"
    }
  }'

Listen to events (SSE):

curl -N http://localhost:8000/v1/streams/{stream_id}/events \
  -H "X-Admin-API-Key: your-secret-key"

๐Ÿ“– Documentation

Environment Variables

# Required
ADMIN_API_KEY=your-admin-key              # API authentication

# VLM Provider (choose one)
VLM_PROVIDER=gemini                        # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key            # If using Gemini
OPENAI_API_KEY=your-openai-key            # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key      # If using Claude

# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100                    # Requests per window
RATE_LIMIT_WINDOW=60                       # Time window (seconds)

Analysis Modes

Basic - Simple video description

{
  "analysis": {
    "mode": "basic",
    "prompt": "Describe the activity"
  }
}

Timestamps - Find specific moments

{
  "analysis": {
    "mode": "timestamps",
    "find_timestamps": {
      "query": "when does someone wave",
      "find_all": true,
      "confidence_threshold": 0.7
    }
  }
}

Supported Connectors

Connector Description Config
RTSP IP camera streams username, password, transport (tcp/udp)
ONVIF Auto-discovery + PTZ username, password, profile_index
UDP UDP video receiver host, port, buffer_size
WebRTC Browser streams signaling_url, ice_servers

API Endpoints

POST   /v1/streams/create              Create stream
GET    /v1/streams/{id}/events         SSE event stream
GET    /v1/streams/{id}                Get status
DELETE /v1/streams/{id}                Stop stream
GET    /v1/streams                     List all streams
GET    /v1/streams/discover/onvif      Discover cameras
GET    /v1/streams/health              Health check

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Connector  โ”‚ (RTSP/ONVIF/UDP/WebRTC)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Frames
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    RT-DETR   โ”‚ (Object detection + motion tracking)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Events (only motion/activity)
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Event Bufferโ”‚ (Collects frames during events)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Complete Events
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                โ”‚
       โ–ผ                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Storage  โ”‚    โ”‚    VLM   โ”‚ (Gemini/Qwen/ObserveeVLM)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚ SSE / Webhooksโ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.


๐Ÿ“ฆ Repository Layout

vlm-sdk/
โ”œโ”€โ”€ vlm/                        # Core SDK components
โ”œโ”€โ”€ api/                        # FastAPI service (routers, services, models)
โ”œโ”€โ”€ examples/                   # Sample scripts for RTSP/UDP/WebRTC usage
โ”œโ”€โ”€ docs/                       # Additional documentation
โ”œโ”€โ”€ mediamtx/                   # MediaMTX config for WebRTC/RTSP bridging
โ”œโ”€โ”€ output/                     # Example generated clips (safe to remove)
โ”œโ”€โ”€ pyproject.toml              # SDK packaging metadata
โ”œโ”€โ”€ requirements.txt            # Full dependency list for API/Docker
โ”œโ”€โ”€ Dockerfile                  # Reference container for the API
โ””โ”€โ”€ README.md

๐Ÿ”ง Development

# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk

# Install with dev dependencies
pip install -e ".[dev]"

# Include API stack if you plan to run the server locally
pip install -r requirements.txt

# Run tests
pytest tests/

# Format code
black vlm/ api/
ruff check vlm/ api/

# Run API server (development)
uvicorn api.main:app --reload

๐ŸŽฏ Use Cases

  • ๐Ÿข Security & Surveillance: 24/7 perimeter monitoring with motion alerts
  • ๐Ÿช Retail Analytics: Customer counting, queue analysis, behavior tracking
  • ๐Ÿš— Traffic Monitoring: Vehicle counting, flow analysis, incident detection
  • ๐Ÿ  Smart Home: Activity monitoring, intrusion detection
  • ๐Ÿญ Industrial: Safety compliance, equipment monitoring

๐Ÿ“Š Cost Comparison

Approach Frames/Hour VLM API Calls Cost Reduction
Frame-by-frame 54,000 (15 FPS) 54,000 Baseline
Event-based (VLMS) 54,000 ~1,000 98% โœ…

Example: 1-hour 15 FPS stream with 5-10 motion events


๐Ÿค Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

Apache-2.0 โ€“ Permissive license suitable for commercial and open-source use.

See LICENSE for the complete text. Commercial support is available on request.


๐Ÿ™ Acknowledgments

  • Ultralytics RT-DETR: Object detection and tracking
  • FastAPI: Modern Python web framework
  • Google Gemini: Video understanding API
  • Qwen API: Alternative Video Understanding API
  • ByteTrack: Multi-object tracking algorithm

Built with โค๏ธ for efficient video intelligence in SF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_sdk-0.0.3.tar.gz (49.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_sdk-0.0.3-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file vlm_sdk-0.0.3.tar.gz.

File metadata

  • Download URL: vlm_sdk-0.0.3.tar.gz
  • Upload date:
  • Size: 49.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.3.tar.gz
Algorithm Hash digest
SHA256 361d36f160f48e061f97f4a7f425ae3b749720180790836de7130c1346d08455
MD5 31654f93a1233b10cf19c3fe15e6f659
BLAKE2b-256 8802ebf2350ee4b080e1575de934e758590e5a0c1ec614e2e87156cd770afa6d

See more details on using hashes here.

File details

Details for the file vlm_sdk-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: vlm_sdk-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1440fbc78e0a3265184ca516078c09bc23b38b9afba80d8eda82c0496adaf0cd
MD5 b6054274cb756efbd83a833c8faf3c0c
BLAKE2b-256 07b22b6942b86bf9ff44b25e7e0861fef6378bc9ef77866ce52f59d453c1e4e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page