Skip to main content

Advanced Vision-Language Model Engine for content tagging

Project description

VLM Engine

A high-performance Python package for Vision-Language Model (VLM) based content tagging and analysis. This package provides an advanced implementation for automatic content detection and tagging, delivering superior accuracy compared to traditional image classification methods.

Features

  • Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
  • Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
  • Flexible Architecture: Modular pipeline system with configurable models and processing stages
  • Asynchronous Processing: Built on asyncio for efficient video and image processing
  • Customizable Tag Sets: Easy configuration of detection categories
  • Production Ready: Includes retry logic, error handling, and comprehensive logging

Documentation

  • USER_GUIDE.md - Comprehensive configuration guide with detailed parameter descriptions, examples, and best practices
  • examples/ - Working code examples for various use cases
  • MULTIPLEXER_INTEGRATION.md - Detailed multiplexer setup and configuration

Features

  • Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
  • Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
  • Flexible Architecture: Modular pipeline system with configurable models and processing stages
  • Asynchronous Processing: Built on asyncio for efficient video and image processing
  • Customizable Tag Sets: Easy configuration of detection categories
  • Production Ready: Includes retry logic, error handling, and comprehensive logging

Installation

From PyPI (when published)

pip install vlm-engine

From Source

git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd vlm-engine-package
pip install -e .

Requirements

  • Python 3.8+
  • Sufficient RAM: Video preprocessing loads entire videos into memory (not GPU memory)
  • Compatible VLM server endpoint:
    • Remote OpenAI-compatible API (recommended)
    • Local server using LM Studio

Quick Start

import asyncio
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig

# Configure the engine
config = EngineConfig(
    active_ai_models=["llm_vlm_model"],
    models={
        "llm_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            api_base_url="http://localhost:7045",
            tag_list=["tag1", "tag2", "tag3"]  # Your custom tags
        )
    }
)

# Initialize and use
async def main():
    engine = VLMEngine(config)
    await engine.initialize()

    results = await engine.process_video(
        "path/to/video.mp4",
        frame_interval=2.0,
        threshold=0.5
    )
    print(f"Detected tags: {results}")

asyncio.run(main())

For more detailed configuration options, parameter descriptions, and best practices, see the USER_GUIDE.md.

Multiplexer Configuration (Load Balancing)

For high-performance deployments, you can configure multiple VLM endpoints with automatic load balancing:

from vlm_engine.config_models import EngineConfig, ModelConfig

config = EngineConfig(
    active_ai_models=["vlm_multiplexer_model"],
    models={
        "vlm_multiplexer_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            use_multiplexer=True,  # Enable multiplexer mode
            multiplexer_endpoints=[
                {
                    "base_url": "http://server1:7045/v1",
                    "api_key": "",
                    "name": "primary-server",
                    "weight": 5,  # Higher weight = more requests
                    "is_fallback": False
                },
                {
                    "base_url": "http://server2:7045/v1",
                    "api_key": "",
                    "name": "secondary-server",
                    "weight": 3,
                    "is_fallback": False
                },
                {
                    "base_url": "http://backup:7045/v1",
                    "api_key": "",
                    "name": "backup-server",
                    "weight": 1,
                    "is_fallback": True  # Used only when primaries fail
                }
            ],
            tag_list=["tag1", "tag2", "tag3"]
        )
    }
)

Architecture

Core Components

  1. VLMEngine: Main entry point for the package

    • Manages model initialization and pipeline execution
    • Handles asynchronous processing of videos and images
  2. VLMClient: OpenAI-compatible API client with multiplexer support

    • Supports any VLM with chat completions endpoint
    • Load balancing across multiple endpoints using multiplexer-llm
    • Automatic failover for high availability
    • Includes retry logic with exponential backoff and jitter
    • Handles image encoding and prompt formatting
  3. Pipeline System: Flexible processing pipeline

    • Modular design allows custom processing stages
    • Built-in support for preprocessing, analysis, and postprocessing
    • Configurable through YAML or Python objects
  4. Model Management: Dynamic model loading

    • Supports multiple model types (VLM, preprocessors, postprocessors)
    • Lazy loading for efficient resource usage
    • Thread-safe model access

For detailed architecture information and component interactions, see USER_GUIDE.md.

Configuration

The VLM Engine uses four main configuration classes:

  1. EngineConfig - Global engine settings and behavior
  2. PipelineConfig - Defines processing workflows
  3. ModelConfig - Configures individual AI models and processors
  4. PipelineModelConfig - Defines how models integrate into pipelines

For detailed parameter descriptions, configuration examples, and best practices, see USER_GUIDE.md.

Basic Configuration

from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig

config = EngineConfig(
    active_ai_models=["my_vlm_model"],
    models={
        "my_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="model-name",
            api_base_url="http://localhost:8000",
            tag_list=["action1", "action2", "action3"],
            max_batch_size=5,
            instance_count=3,
            model_return_confidence=True
        )
    },
    pipelines={
        "video_pipeline": PipelineConfig(
            inputs=["video_path", "frame_interval"],
            output="results",
            version=1.0,
            models=[
                PipelineModelConfig(
                    name="my_vlm_model",
                    inputs=["video_path"],
                    outputs=["results"]
                )
            ]
        )
    }
)

Multiplexer Configuration

For high-performance deployments with load balancing:

from vlm_engine.config_models import ModelConfig

config = EngineConfig(
    active_ai_models=["vlm_multiplexer_model"],
    models={
        "vlm_multiplexer_model": ModelConfig(
            type="vlm_model",
            model_id="model-name",
            use_multiplexer=True,
            multiplexer_endpoints=[
                {
                    "api_base_url": "http://server1:7045/v1",
                    "model_id": "model-name",
                    "weight": 5
                },
                {
                    "api_base_url": "http://server2:7045/v1",
                    "model_id": "model-name",
                    "weight": 3
                }
            ],
            tag_list=["tag1", "tag2", "tag3"]
        )
    }
)

Advanced Configuration

The package supports complex configurations including:

  • Multiple models in a pipeline
  • Custom preprocessing and postprocessing stages
  • Category-specific settings (thresholds, durations, etc.)
  • Batch processing configurations
  • Category filtering and transformation rules

See the examples directory for detailed configuration examples.

For comprehensive configuration details, parameter descriptions, and best practices, see USER_GUIDE.md.

API Reference

VLMEngine

class VLMEngine:
    def __init__(self, config: EngineConfig)
    async def initialize()
    async def process_video(video_path: str, **kwargs) -> Dict[str, Any]

Processing Parameters

  • video_path: Path to the video file
  • frame_interval: Seconds between frame samples (default: 0.5)
  • threshold: Confidence threshold for tag detection (default: 0.5)
  • return_timestamps: Include timestamp information (default: True)
  • return_confidence: Include confidence scores (default: True)

For detailed parameter descriptions and configuration options, see USER_GUIDE.md.

Performance Optimization

Memory Requirements

  • Important: Video preprocessing loads the entire video into system RAM (not GPU memory)
  • Ensure sufficient RAM for your video sizes (e.g., a 1GB video may require 4-8GB of available RAM)
  • Consider processing videos in segments for very large files

API Optimization

  • Configure retry settings based on your VLM server's capacity
  • Adjust max_batch_size to balance throughput vs memory usage
  • Use appropriate frame_interval to reduce processing time and API calls

Processing Speed

  • Increase frame_interval to sample fewer frames (faster but less accurate)
  • Use batch processing when your VLM endpoint supports it
  • Consider running multiple VLM instances for parallel processing

For detailed performance tuning guidelines and best practices, see USER_GUIDE.md.

Extending the Package

Custom Models

Create custom model classes by inheriting from the base Model class:

from vlm_engine.models import Model

class CustomModel(Model):
    async def process(self, inputs):
        # Your custom processing logic
        return results

Custom Pipelines

Define custom pipelines for specific use cases:

custom_pipeline = PipelineConfig(
    inputs=["image_path"],
    output="analysis",
    models=[
        {"name": "preprocessor", "inputs": ["image_path"], "outputs": "processed_image"},
        {"name": "analyzer", "inputs": ["processed_image"], "outputs": "analysis"}
    ]
)

For detailed information on model types, pipeline design, and best practices, see USER_GUIDE.md.

Troubleshooting

Common Issues

  1. "No valid pipelines loaded" Error

    • Cause: Configuration is missing required pipeline definitions or models
    • Solution: Ensure your EngineConfig includes:
      • At least one pipeline in the pipelines dictionary
      • Models defined in the models dictionary that are referenced by pipelines
      • Valid active_ai_models list pointing to existing model names
    • Best Practice: Use the provided haven_vlm_config.py as a reference configuration
  2. "Cannot import EngineConfig" Error

    • Cause: Incorrect import statement
    • Solution: Import from the correct module:
      from vlm_engine import VLMEngine  # Only VLMEngine is exposed
      from vlm_engine.config_models import EngineConfig  # Config classes are in separate module
      
  3. Connection Errors

    • Ensure your VLM server is running and accessible
    • Check the api_base_url configuration
    • Verify firewall settings
  4. GPU Memory Errors

    • Reduce batch size or frame interval
    • Ensure proper CUDA installation
    • Check GPU memory availability
  5. Slow Processing

    • Increase frame interval for faster processing
    • Use GPU acceleration if available
    • Optimize VLM server settings

Package Import Best Practices

What's exposed to consumers:

  • Only VLMEngine is exported via from vlm_engine import *
  • All configuration classes are in vlm_engine.config_models
  • Internal classes (Pipeline, ModelManager, etc.) are not exported

Correct usage pattern:

# ✅ CORRECT - Import what you need
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig

# ❌ WRONG - Don't try to import internal classes
from vlm_engine import Pipeline  # This will fail
from vlm_engine import ModelManager  # This will fail

Platform-Specific Notes

macOS Users:

  • The package uses PyAV for video processing (no decord required)
  • Video preprocessing loads entire videos into system RAM (not GPU memory)
  • Ensure sufficient RAM for your video sizes (e.g., 1GB video may require 4-8GB RAM)

Linux/Windows Users:

  • Optionally install decord for faster video decoding: pip install vlm-engine[decord]
  • PyAV is the default and works on all platforms

For detailed troubleshooting steps and validation checks, see USER_GUIDE.md.

Logging

Enable debug logging for troubleshooting:

import logging
logging.basicConfig(level=logging.DEBUG)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

git clone https://github.com/yourusername/vlm-engine.git
cd vlm-engine
pip install -e ".[dev]"

Running Tests

pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of modern Python async patterns

  • Inspired by production ML serving architectures

  • Haven's custom VLM models trained using SmolVLM-Finetune - Model Download found on https://havenmodels.orbiter.website/

  • Designed for integration with OpenAI-compatible VLM endpoints

Support

For issues and feature requests, please use the GitHub issue tracker.

For questions and discussions, join our community:


Note: This package requires an OpenAI-compatible VLM endpoint. Options include:

Remote Services

Local Setup

  • LM Studio - Easy local VLM hosting with OpenAI-compatible API

The package does not load VLM models directly - it communicates with external VLM services via API.

Documentation

  • USER_GUIDE.md - Comprehensive configuration guide with detailed parameter descriptions, examples, and best practices
  • examples/ - Working code examples for various use cases
  • MULTIPLEXER_INTEGRATION.md - Detailed multiplexer setup and configuration

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_engine-0.8.45.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlm_engine-0.8.45-py3-none-any.whl (62.3 kB view details)

Uploaded Python 3

File details

Details for the file vlm_engine-0.8.45.tar.gz.

File metadata

  • Download URL: vlm_engine-0.8.45.tar.gz
  • Upload date:
  • Size: 57.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.8.45.tar.gz
Algorithm Hash digest
SHA256 26856dfd45cbe4d434b077fc3ca306df89e7053f6c8feab2135a1e31dda5064e
MD5 0c6e466126fc9a3a13eb4ef351931d51
BLAKE2b-256 8e92f68c7f3efcf496a7048de886e4ae512e07572f19f117b39c886bf2349758

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.8.45.tar.gz:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vlm_engine-0.8.45-py3-none-any.whl.

File metadata

  • Download URL: vlm_engine-0.8.45-py3-none-any.whl
  • Upload date:
  • Size: 62.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.8.45-py3-none-any.whl
Algorithm Hash digest
SHA256 db97b4166337615fcc14a18968b559703b60371505e0d1a541c29a51f3950360
MD5 731e883bb2581c06666439c042e598de
BLAKE2b-256 c9f13974073f120877121922daf768ade85eda1c7e268c7ab7dd7ff5b5fa2bd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.8.45-py3-none-any.whl:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page