VLMS - Video Intelligence SDK with event-based processing

These details have not been verified by PyPI

Project description

VLMS - Video Intelligence SDK

Event-based video intelligence with 98% cost reduction

Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.

Note: pip install vlm-sdk installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.

🌟 Features

Core SDK (`vlm`)

🎯 Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
📹 Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
🤖 RT-DETR + ByteTrack: Real-time object detection and motion tracking
🧠 Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
🎨 Advanced analysis: Timestamps, object detection, bounding boxes, range queries

Production API (`api`)

⚡ FastAPI REST API: Industry-standard multi-stream video intelligence
📡 Server-Sent Events (SSE): Real-time event streaming
🔐 Authentication: API key-based auth with rate limiting
📊 Monitoring: Health checks, metrics, stream management
🔧 Configurable: Environment-based provider selection

🚀 Quick Start

Installation

# Install from PyPI
pip install vlm-sdk

# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .

SDK Usage

from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio

# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor(
    confidence_threshold=0.6,
    track_objects=["person", "car"],
    min_duration=2.0  # Only events longer than 2 seconds
)

gemini = GeminiVideoService(api_key="your-gemini-key")

# Process stream
async def process():
    for frame in connector.stream_frames():
        result = preprocessor.process_frame(frame.data, frame.timestamp)

        if result['status'] == 'completed':
            # Event detected! Analyze with VLM
            upload = await gemini.upload_file(result['clip_path'])
            analysis = await gemini.query_video_with_file(
                upload['name'],
                "Describe the activity in this video"
            )
            print(f"Analysis: {analysis['response']}")

asyncio.run(process())

API Server

# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini  # or openai, anthropic

# Install SDK (from repo checkout)
pip install -e .

# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt

# Run server
python -m api.main

# Server starts at http://localhost:8000

Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided mediamtx.yml (see docs/apiguide.md for commands).

Docker Image

# Pull the public image (linux/amd64)
docker pull observee/vlm-sdk:latest

# Run the API (set your API keys as needed)
docker run --rm -p 8000:8000 \
  -e ADMIN_API_KEY=your-secret-key \
  -e GEMINI_API_KEY=your-gemini-key \
  observee/vlm-sdk:latest

Create a stream:

curl -X POST http://localhost:8000/v1/streams/create \
  -H "X-Admin-API-Key: your-secret-key" \
  -H "X-VLM-API-Key: your-gemini-key" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "rtsp",
    "source_url": "rtsp://camera.local/stream1",
    "config": {
      "username": "admin",
      "password": "password",
      "profile": "security",
      "min_duration": 2.0
    },
    "analysis": {
      "enabled": true,
      "mode": "basic",
      "prompt": "Describe any activity or movement"
    }
  }'

Listen to events (SSE):

curl -N http://localhost:8000/v1/streams/{stream_id}/events \
  -H "X-Admin-API-Key: your-secret-key"

📖 Documentation

Environment Variables

# Required
ADMIN_API_KEY=your-admin-key              # API authentication

# VLM Provider (choose one)
VLM_PROVIDER=gemini                        # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key            # If using Gemini
OPENAI_API_KEY=your-openai-key            # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key      # If using Claude

# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100                    # Requests per window
RATE_LIMIT_WINDOW=60                       # Time window (seconds)

Analysis Modes

Basic - Simple video description

{
  "analysis": {
    "mode": "basic",
    "prompt": "Describe the activity"
  }
}

Timestamps - Find specific moments

{
  "analysis": {
    "mode": "timestamps",
    "find_timestamps": {
      "query": "when does someone wave",
      "find_all": true,
      "confidence_threshold": 0.7
    }
  }
}

Supported Connectors

Connector	Description	Config
RTSP	IP camera streams	`username`, `password`, `transport` (tcp/udp)
ONVIF	Auto-discovery + PTZ	`username`, `password`, `profile_index`
UDP	UDP video receiver	`host`, `port`, `buffer_size`
WebRTC	Browser streams	`signaling_url`, `ice_servers`

API Endpoints

POST   /v1/streams/create              Create stream
GET    /v1/streams/{id}/events         SSE event stream
GET    /v1/streams/{id}                Get status
DELETE /v1/streams/{id}                Stop stream
GET    /v1/streams                     List all streams
GET    /v1/streams/discover/onvif      Discover cameras
GET    /v1/streams/health              Health check

🏗️ Architecture

┌─────────────┐
│  Connector  │ (RTSP/ONVIF/UDP/WebRTC)
└──────┬──────┘
       │ Frames
       ▼
┌─────────────┐
│    RT-DETR   │ (Object detection + motion tracking)
└──────┬──────┘
       │ Events (only motion/activity)
       ▼
┌─────────────┐
│ Event Buffer│ (Collects frames during events)
└──────┬──────┘
       │ Complete Events
       ├────────────────┐
       │                │
       ▼                ▼
┌───────────┐    ┌──────────┐
│  Storage  │    │    VLM   │ (Gemini/Qwen/ObserveeVLM)
└───────────┘    └────┬─────┘
                      │
                      ▼
              ┌───────────────┐
              │ SSE / Webhooks│
              └───────────────┘

Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.

📦 Repository Layout

vlm-sdk/
├── vlm/                        # Core SDK components
├── api/                        # FastAPI service (routers, services, models)
├── examples/                   # Sample scripts for RTSP/UDP/WebRTC usage
├── docs/                       # Additional documentation
├── mediamtx/                   # MediaMTX config for WebRTC/RTSP bridging
├── output/                     # Example generated clips (safe to remove)
├── pyproject.toml              # SDK packaging metadata
├── requirements.txt            # Full dependency list for API/Docker
├── Dockerfile                  # Reference container for the API
└── README.md

🔧 Development

# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk

# Install with dev dependencies
pip install -e ".[dev]"

# Include API stack if you plan to run the server locally
pip install -r requirements.txt

# Run tests
pytest tests/

# Format code
black vlm/ api/
ruff check vlm/ api/

# Run API server (development)
uvicorn api.main:app --reload

🎯 Use Cases

🏢 Security & Surveillance: 24/7 perimeter monitoring with motion alerts
🏪 Retail Analytics: Customer counting, queue analysis, behavior tracking
🚗 Traffic Monitoring: Vehicle counting, flow analysis, incident detection
🏠 Smart Home: Activity monitoring, intrusion detection
🏭 Industrial: Safety compliance, equipment monitoring

📊 Cost Comparison

Approach	Frames/Hour	VLM API Calls	Cost Reduction
Frame-by-frame	54,000 (15 FPS)	54,000	Baseline
Event-based (VLMS)	54,000	~1,000	98% ✅

Example: 1-hour 15 FPS stream with 5-10 motion events

🤝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Apache-2.0 – Permissive license suitable for commercial and open-source use.

See LICENSE for the complete text. Commercial support is available on request.

🙏 Acknowledgments

Ultralytics RT-DETR: Object detection and tracking
FastAPI: Modern Python web framework
Google Gemini: Video understanding API
Qwen API: Alternative Video Understanding API
ByteTrack: Multi-object tracking algorithm

Built with ❤️ for efficient video intelligence in SF

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.3

Nov 3, 2025

0.0.2

Oct 30, 2025

0.0.1

Oct 30, 2025

0.0.0

Oct 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_sdk-0.0.3.tar.gz (49.6 kB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlm_sdk-0.0.3-py3-none-any.whl (55.3 kB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file vlm_sdk-0.0.3.tar.gz.

File metadata

Download URL: vlm_sdk-0.0.3.tar.gz
Upload date: Nov 3, 2025
Size: 49.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`361d36f160f48e061f97f4a7f425ae3b749720180790836de7130c1346d08455`
MD5	`31654f93a1233b10cf19c3fe15e6f659`
BLAKE2b-256	`8802ebf2350ee4b080e1575de934e758590e5a0c1ec614e2e87156cd770afa6d`

See more details on using hashes here.

File details

Details for the file vlm_sdk-0.0.3-py3-none-any.whl.

File metadata

Download URL: vlm_sdk-0.0.3-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 55.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1440fbc78e0a3265184ca516078c09bc23b38b9afba80d8eda82c0496adaf0cd`
MD5	`b6054274cb756efbd83a833c8faf3c0c`
BLAKE2b-256	`07b22b6942b86bf9ff44b25e7e0861fef6378bc9ef77866ce52f59d453c1e4e8`

See more details on using hashes here.

vlm-sdk 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

VLMS - Video Intelligence SDK

🌟 Features

Core SDK (vlm)

Production API (api)

🚀 Quick Start

Installation

SDK Usage

API Server

Docker Image

📖 Documentation

Environment Variables

Analysis Modes

Supported Connectors

API Endpoints

🏗️ Architecture

📦 Repository Layout

🔧 Development

🎯 Use Cases

📊 Cost Comparison

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Core SDK (`vlm`)

Production API (`api`)