Skip to main content

VLMS - Video Intelligence SDK with event-based processing

Project description

VLMS - Video Intelligence SDK

Event-based video intelligence with 98% cost reduction

Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.

Note: pip install vlm-sdk installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.

Python 3.11+ License: Apache-2.0


๐ŸŒŸ Features

Core SDK (vlm)

  • ๐ŸŽฏ Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
  • ๐Ÿ“น Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
  • ๐Ÿค– RT-DETR + ByteTrack: Real-time object detection and motion tracking
  • ๐Ÿง  Provider-agnostic VLM: Gemini, OpenAI, Claude (via env config)
  • ๐ŸŽจ Advanced analysis: Timestamps, object detection, bounding boxes, range queries

Production API (api)

  • โšก FastAPI REST API: Industry-standard multi-stream video intelligence
  • ๐Ÿ“ก Server-Sent Events (SSE): Real-time event streaming
  • ๐Ÿ” Authentication: API key-based auth with rate limiting
  • ๐Ÿ“Š Monitoring: Health checks, metrics, stream management
  • ๐Ÿ”ง Configurable: Environment-based provider selection

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install vlm-sdk

# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .

SDK Usage

from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio

# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor(
    confidence_threshold=0.6,
    track_objects=["person", "car"],
    min_duration=2.0  # Only events longer than 2 seconds
)

gemini = GeminiVideoService(api_key="your-gemini-key")

# Process stream
async def process():
    for frame in connector.stream_frames():
        result = preprocessor.process_frame(frame.data, frame.timestamp)

        if result['status'] == 'completed':
            # Event detected! Analyze with VLM
            upload = await gemini.upload_file(result['clip_path'])
            analysis = await gemini.query_video_with_file(
                upload['name'],
                "Describe the activity in this video"
            )
            print(f"Analysis: {analysis['response']}")

asyncio.run(process())

API Server

# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini  # or openai, anthropic

# Install SDK (from repo checkout)
pip install -e .

# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt

# Run server
python -m api.main

# Server starts at http://localhost:8000

Create a stream:

curl -X POST http://localhost:8000/v1/streams/create \
  -H "X-Admin-API-Key: your-secret-key" \
  -H "X-VLM-API-Key: your-gemini-key" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "rtsp",
    "source_url": "rtsp://camera.local/stream1",
    "config": {
      "username": "admin",
      "password": "password",
      "profile": "security",
      "min_duration": 2.0
    },
    "analysis": {
      "enabled": true,
      "mode": "basic",
      "prompt": "Describe any activity or movement"
    }
  }'

Listen to events (SSE):

curl -N http://localhost:8000/v1/streams/{stream_id}/events \
  -H "X-Admin-API-Key: your-secret-key"

๐Ÿ“– Documentation

Environment Variables

# Required
ADMIN_API_KEY=your-admin-key              # API authentication

# VLM Provider (choose one)
VLM_PROVIDER=gemini                        # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key            # If using Gemini
OPENAI_API_KEY=your-openai-key            # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key      # If using Claude

# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100                    # Requests per window
RATE_LIMIT_WINDOW=60                       # Time window (seconds)

Analysis Modes

Basic - Simple video description

{
  "analysis": {
    "mode": "basic",
    "prompt": "Describe the activity"
  }
}

Timestamps - Find specific moments

{
  "analysis": {
    "mode": "timestamps",
    "find_timestamps": {
      "query": "when does someone wave",
      "find_all": true,
      "confidence_threshold": 0.7
    }
  }
}

Supported Connectors

Connector Description Config
RTSP IP camera streams username, password, transport (tcp/udp)
ONVIF Auto-discovery + PTZ username, password, profile_index
UDP UDP video receiver host, port, buffer_size
WebRTC Browser streams signaling_url, ice_servers
File Video files realtime, loop

API Endpoints

POST   /v1/streams/create              Create stream
GET    /v1/streams/{id}/events         SSE event stream
GET    /v1/streams/{id}                Get status
DELETE /v1/streams/{id}                Stop stream
GET    /v1/streams                     List all streams
GET    /v1/streams/discover/onvif      Discover cameras
GET    /v1/streams/health              Health check

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Connector  โ”‚ (RTSP/ONVIF/UDP/WebRTC/File)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Frames
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    RT-DETR   โ”‚ (Object detection + motion tracking)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Events (only motion/activity)
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Event Bufferโ”‚ (Collects frames during events)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ Complete Events
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                โ”‚
       โ–ผ                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Storage  โ”‚    โ”‚    VLM   โ”‚ (Gemini/Qwen/Observee VLM)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
                      โ”‚
                      โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚ SSE / Webhooksโ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.


๐Ÿ”ง Development

# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk

# Install with dev dependencies
pip install -e ".[dev]"

# Include API stack if you plan to run the server locally
pip install -r requirements.txt

# Run tests
pytest tests/

# Format code
black vlm/ api/
ruff check vlm/ api/

# Run API server (development)
uvicorn api.main:app --reload

๐ŸŽฏ Use Cases

  • ๐Ÿข Security & Surveillance: 24/7 perimeter monitoring with motion alerts
  • ๐Ÿช Retail Analytics: Customer counting, queue analysis, behavior tracking
  • ๐Ÿš— Traffic Monitoring: Vehicle counting, flow analysis, incident detection
  • ๐Ÿ  Smart Home: Activity monitoring, intrusion detection
  • ๐Ÿญ Industrial: Safety compliance, equipment monitoring

๐Ÿ“Š Cost Comparison

Approach Frames/Hour VLM API Calls Cost Reduction
Frame-by-frame 54,000 (15 FPS) 54,000 Baseline
Event-based (VLMS) 54,000 ~1,000 98% โœ…

Example: 1-hour 15 FPS stream with 5-10 motion events


๐Ÿค Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

Apache-2.0 โ€“ Permissive license suitable for commercial and open-source use.

See LICENSE for the complete text. Commercial support is available on request.


๐Ÿ™ Acknowledgments

  • Ultralytics RT-DETR: Object detection and tracking
  • FastAPI: Modern Python web framework
  • Google Gemini: Video understanding API
  • ByteTrack: Multi-object tracking algorithm

Built with โค๏ธ for efficient video intelligence in SF

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_sdk-0.0.1.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_sdk-0.0.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file vlm_sdk-0.0.1.tar.gz.

File metadata

  • Download URL: vlm_sdk-0.0.1.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.1.tar.gz
Algorithm Hash digest
SHA256 df772baeed7af8c31fac723a7b2d5ff90761bc482a459b176de969cf4b272fef
MD5 e974bb90f0b7cd4d7e59d07217501c5a
BLAKE2b-256 691e88220425a6e297626d0a9f4b9e5abdbb6d12f06c8f3ad628d23069e65140

See more details on using hashes here.

File details

Details for the file vlm_sdk-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: vlm_sdk-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for vlm_sdk-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ada5b9cfb1dee053e87b97e4f20569c54da168f6ac415652143fe8a90bf17bb0
MD5 21a9c0aa143235d4b3023e19bdb77f04
BLAKE2b-256 8b8f7823fddb6a26b97f5a3cfcaa15b987e9777fe55a9fbf97deaa5e8f4b3fbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page