Skip to main content

Advanced Vision-Language Model Engine for content tagging

Project description

VLM Engine

A high-performance Python package for Vision-Language Model (VLM) based content tagging and analysis. This package provides an advanced implementation for automatic content detection and tagging, delivering superior accuracy compared to traditional image classification methods.

Features

  • Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
  • Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
  • Flexible Architecture: Modular pipeline system with configurable models and processing stages
  • Asynchronous Processing: Built on asyncio for efficient video and image processing
  • Customizable Tag Sets: Easy configuration of detection categories
  • Production Ready: Includes retry logic, error handling, and comprehensive logging

Installation

From PyPI (when published)

CPU-only Installation (Default, Recommended)

For most use cases (including AWS batch jobs), install the CPU-only version which is ~3GB smaller:

pip install vlm-engine --index-url https://download.pytorch.org/whl/cpu

GPU Installation (CUDA-enabled)

If you need GPU support for local model inference (not required for VLM API usage):

pip install vlm-engine

From Source

CPU-only Installation (Default, Recommended)

git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd haven-vlm-engine-package
pip install -e . --index-url https://download.pytorch.org/whl/cpu

GPU Installation (CUDA-enabled)

git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd haven-vlm-engine-package
pip install -e .

Installation Notes

Why CPU-only by default?

  • This package connects to REMOTE OpenAI-compatible VLM endpoints - it never loads models locally
  • PyTorch is only used for image preprocessing (tensor operations, transforms)
  • CUDA-enabled PyTorch adds ~3GB of unnecessary dependencies for CPU-only workloads
  • Perfect for Docker deployments and AWS batch jobs

Switching between CPU and GPU:

# Switch from GPU to CPU
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Switch from CPU to GPU (CUDA 12.1)
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Docker Optimization

For Docker deployments (especially AWS batch jobs), use CPU-only PyTorch to reduce image size by ~3GB:

FROM python:3.10-slim

WORKDIR /app

# Install vlm_engine with CPU-only PyTorch
RUN pip install --no-cache-dir \
    --index-url https://download.pytorch.org/whl/cpu \
    vlm_engine

# Copy your application
COPY . .

CMD ["python", "your_batch_job.py"]

Size comparison:

  • With CUDA PyTorch: ~5GB image
  • With CPU-only PyTorch: ~2GB image (60% smaller)

Requirements

  • Python 3.8+
  • Sufficient RAM: Video preprocessing loads entire videos into memory (not GPU memory)
  • Compatible VLM server endpoint:

Quick Start

import asyncio
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig

# Configure the engine
config = EngineConfig(
    active_ai_models=["llm_vlm_model"],
    models={
        "llm_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            api_base_url="http://localhost:7045",
            tag_list=["tag1", "tag2", "tag3"]  # Your custom tags
        )
    }
)

# Initialize and use
async def main():
    engine = VLMEngine(config)
    await engine.initialize()
    
    results = await engine.process_video(
        "path/to/video.mp4",
        frame_interval=2.0,
        threshold=0.5
    )
    print(f"Detected tags: {results}")

asyncio.run(main())

Multiplexer Configuration (Load Balancing)

For high-performance deployments, you can configure multiple VLM endpoints with automatic load balancing:

from vlm_engine.config_models import EngineConfig, ModelConfig

config = EngineConfig(
    active_ai_models=["vlm_multiplexer_model"],
    models={
        "vlm_multiplexer_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            use_multiplexer=True,  # Enable multiplexer mode
            multiplexer_endpoints=[
                {
                    "base_url": "http://server1:7045/v1",
                    "api_key": "",
                    "name": "primary-server",
                    "weight": 5,  # Higher weight = more requests
                    "is_fallback": False
                },
                {
                    "base_url": "http://server2:7045/v1",
                    "api_key": "",
                    "name": "secondary-server",
                    "weight": 3,
                    "is_fallback": False
                },
                {
                    "base_url": "http://backup:7045/v1",
                    "api_key": "",
                    "name": "backup-server",
                    "weight": 1,
                    "is_fallback": True  # Used only when primaries fail
                }
            ],
            tag_list=["tag1", "tag2", "tag3"]
        )
    }
)

Architecture

Core Components

  1. VLMEngine: Main entry point for the package

    • Manages model initialization and pipeline execution
    • Handles asynchronous processing of videos and images
  2. VLMClient: OpenAI-compatible API client with multiplexer support

    • Supports any VLM with chat completions endpoint
    • Load balancing across multiple endpoints using multiplexer-llm
    • Automatic failover for high availability
    • Includes retry logic with exponential backoff and jitter
    • Handles image encoding and prompt formatting
  3. Pipeline System: Flexible processing pipeline

    • Modular design allows custom processing stages
    • Built-in support for preprocessing, analysis, and postprocessing
    • Configurable through YAML or Python objects
  4. Model Management: Dynamic model loading

    • Supports multiple model types (VLM, preprocessors, postprocessors)
    • Lazy loading for efficient resource usage
    • Thread-safe model access

Configuration

Basic Configuration

from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig

config = EngineConfig(
    active_ai_models=["my_vlm_model"],
    models={
        "my_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="model-name",
            api_base_url="http://localhost:8000",
            tag_list=["action1", "action2", "action3"],
            max_new_tokens=128,
            request_timeout=70,
            vlm_detected_tag_confidence=0.99
        )
    },
    pipelines={
        "video_pipeline": PipelineConfig(
            inputs=["video_path", "frame_interval"],
            output="results",
            models=[{"name": "my_vlm_model", "inputs": ["frame"], "outputs": "tags"}]
        )
    }
)

Multiplexer Benefits

  • Load Balancing: Distribute requests across multiple VLM endpoints based on configurable weights
  • High Availability: Automatic failover to backup endpoints when primary endpoints fail
  • Improved Performance: Parallel processing across multiple servers for higher throughput
  • Seamless Integration: Drop-in replacement for single endpoint configurations
  • Flexible Configuration: Mix of primary and fallback endpoints with custom weights

Advanced Configuration

The package supports complex configurations including:

  • Multiple models in a pipeline
  • Custom preprocessing and postprocessing stages
  • Category-specific settings (thresholds, durations, etc.)
  • Batch processing configurations

See the examples directory for detailed configuration examples.

For comprehensive multiplexer setup and configuration, see MULTIPLEXER_INTEGRATION.md.

API Reference

VLMEngine

class VLMEngine:
    def __init__(self, config: EngineConfig)
    async def initialize()
    async def process_video(video_path: str, **kwargs) -> Dict[str, Any]

Processing Parameters

  • video_path: Path to the video file
  • frame_interval: Seconds between frame samples (default: 0.5)
  • threshold: Confidence threshold for tag detection (default: 0.5)
  • return_timestamps: Include timestamp information (default: True)
  • return_confidence: Include confidence scores (default: True)

Performance Optimization

Memory Requirements

  • Important: Video preprocessing loads the entire video into system RAM (not GPU memory)
  • Ensure sufficient RAM for your video sizes (e.g., a 1GB video may require 4-8GB of available RAM)
  • Consider processing videos in segments for very large files

API Optimization

  • Configure retry settings based on your VLM server's capacity
  • Adjust max_new_tokens to balance speed vs accuracy
  • Use appropriate frame_interval to reduce processing time and API calls

Processing Speed

  • Increase frame_interval to sample fewer frames (faster but less accurate)
  • Use batch processing when your VLM endpoint supports it
  • Consider running multiple VLM instances for parallel processing

Extending the Package

Custom Models

Create custom model classes by inheriting from the base Model class:

from vlm_engine.models import Model

class CustomModel(Model):
    async def process(self, inputs):
        # Your custom processing logic
        return results

Custom Pipelines

Define custom pipelines for specific use cases:

custom_pipeline = PipelineConfig(
    inputs=["image_path"],
    output="analysis",
    models=[
        {"name": "preprocessor", "inputs": ["image_path"], "outputs": "processed_image"},
        {"name": "analyzer", "inputs": ["processed_image"], "outputs": "analysis"}
    ]
)

Troubleshooting

Common Issues

  1. Connection Errors

    • Ensure your VLM server is running and accessible
    • Check the api_base_url configuration
    • Verify firewall settings
  2. GPU Memory Errors

    • This package uses CPU by default for preprocessing - GPU memory errors should not occur
    • If using GPU-enabled PyTorch, ensure proper CUDA installation
    • Check GPU memory availability
  3. Slow Processing

    • Increase frame interval for faster processing
    • The package uses CPU for preprocessing which is sufficient for VLM API usage
    • Optimize VLM server settings
  4. PyTorch Installation Issues

    • Issue: torch package is very large (~3GB for CUDA version)
    • Solution: Use CPU-only PyTorch for ~90% smaller install:
      pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
      
    • Issue: Need to switch between CPU and GPU versions
    • Solution: Uninstall first, then reinstall with correct index URL:
      pip uninstall torch torchvision
      pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu  # CPU
      # OR
      pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121  # GPU
      

Logging

Enable debug logging for troubleshooting:

import logging
logging.basicConfig(level=logging.DEBUG)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

git clone https://github.com/yourusername/vlm-engine.git
cd vlm-engine
pip install -e ".[dev]"

Running Tests

pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of modern Python async patterns

  • Inspired by production ML serving architectures

  • Haven's custom VLM models trained using SmolVLM-Finetune - Model Download found on https://havenmodels.orbiter.website/

  • Designed for integration with OpenAI-compatible VLM endpoints

Support

For issues and feature requests, please use the GitHub issue tracker.

For questions and discussions, join our community:


Note: This package requires an OpenAI-compatible VLM endpoint. Options include:

Remote Services

Local Setup

  • LM Studio - Easy local VLM hosting with OpenAI-compatible API

The package does not load VLM models directly - it communicates with external VLM services via API.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_engine-0.9.2.tar.gz (67.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_engine-0.9.2-py3-none-any.whl (74.8 kB view details)

Uploaded Python 3

File details

Details for the file vlm_engine-0.9.2.tar.gz.

File metadata

  • Download URL: vlm_engine-0.9.2.tar.gz
  • Upload date:
  • Size: 67.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.9.2.tar.gz
Algorithm Hash digest
SHA256 ee569a7e30098d6de84fad155687e1c9a8c22226c2deb398b349b17086d48c9e
MD5 61c5abbd023ab68b8d5467c181d895d5
BLAKE2b-256 74f84169a557b504e4dadf795849aa7241e6cef24e84067e60537b56bfd4be78

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.9.2.tar.gz:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_engine-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: vlm_engine-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 74.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83c90f58397deeac1ce6d15d06d7266981b7fa46b18d110e37d25d43aa7e582e
MD5 b45df2d7d32513f2e8bdef149b3dc15b
BLAKE2b-256 540bd4d886dcbc4455c0777f03c623ebb783888e4c0559b9df897ed4e65fc05c

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.9.2-py3-none-any.whl:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page