Advanced Vision-Language Model Engine for content tagging

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

havencto masterpy

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

VLM Engine

A high-performance Python package for Vision-Language Model (VLM) based content tagging and analysis. This package provides an advanced implementation for automatic content detection and tagging, delivering superior accuracy compared to traditional image classification methods.

Features

Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
Flexible Architecture: Modular pipeline system with configurable models and processing stages
Asynchronous Processing: Built on asyncio for efficient video and image processing
Customizable Tag Sets: Easy configuration of detection categories
Production Ready: Includes retry logic, error handling, and comprehensive logging

Installation

From PyPI (when published)

CPU-only Installation (Default, Recommended)

For most use cases (including AWS batch jobs), install the CPU-only version which is ~3GB smaller:

pip install vlm-engine --index-url https://download.pytorch.org/whl/cpu

GPU Installation (CUDA-enabled)

If you need GPU support for local model inference (not required for VLM API usage):

pip install vlm-engine

From Source

CPU-only Installation (Default, Recommended)

git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd haven-vlm-engine-package
pip install -e . --index-url https://download.pytorch.org/whl/cpu

GPU Installation (CUDA-enabled)

git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd haven-vlm-engine-package
pip install -e .

Installation Notes

Why CPU-only by default?

This package connects to REMOTE OpenAI-compatible VLM endpoints - it never loads models locally
PyTorch is only used for image preprocessing (tensor operations, transforms)
CUDA-enabled PyTorch adds ~3GB of unnecessary dependencies for CPU-only workloads
Perfect for Docker deployments and AWS batch jobs

Switching between CPU and GPU:

# Switch from GPU to CPU
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Switch from CPU to GPU (CUDA 12.1)
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Docker Optimization

For Docker deployments (especially AWS batch jobs), use CPU-only PyTorch to reduce image size by ~3GB:

FROM python:3.10-slim

WORKDIR /app

# Install vlm_engine with CPU-only PyTorch
RUN pip install --no-cache-dir \
    --index-url https://download.pytorch.org/whl/cpu \
    vlm_engine

# Copy your application
COPY . .

CMD ["python", "your_batch_job.py"]

Size comparison:

With CUDA PyTorch: ~5GB image
With CPU-only PyTorch: ~2GB image (60% smaller)

Requirements

Python 3.8+
Sufficient RAM: Video preprocessing loads entire videos into memory (not GPU memory)
Compatible VLM server endpoint:
- Remote OpenAI-compatible API (recommended)
- Local server using LM Studio
- Haven's custom VLM available at https://havenmodels.orbiter.website/

Quick Start

import asyncio
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig

# Configure the engine
config = EngineConfig(
    active_ai_models=["llm_vlm_model"],
    models={
        "llm_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            api_base_url="http://localhost:7045",
            tag_list=["tag1", "tag2", "tag3"]  # Your custom tags
        )
    }
)

# Initialize and use
async def main():
    engine = VLMEngine(config)
    await engine.initialize()
    
    results = await engine.process_video(
        "path/to/video.mp4",
        frame_interval=2.0,
        threshold=0.5
    )
    print(f"Detected tags: {results}")

asyncio.run(main())

Multiplexer Configuration (Load Balancing)

For high-performance deployments, you can configure multiple VLM endpoints with automatic load balancing:

from vlm_engine.config_models import EngineConfig, ModelConfig

config = EngineConfig(
    active_ai_models=["vlm_multiplexer_model"],
    models={
        "vlm_multiplexer_model": ModelConfig(
            type="vlm_model",
            model_id="HuggingFaceTB/SmolVLM-Instruct",
            use_multiplexer=True,  # Enable multiplexer mode
            multiplexer_endpoints=[
                {
                    "base_url": "http://server1:7045/v1",
                    "api_key": "",
                    "name": "primary-server",
                    "weight": 5,  # Higher weight = more requests
                    "is_fallback": False
                },
                {
                    "base_url": "http://server2:7045/v1",
                    "api_key": "",
                    "name": "secondary-server",
                    "weight": 3,
                    "is_fallback": False
                },
                {
                    "base_url": "http://backup:7045/v1",
                    "api_key": "",
                    "name": "backup-server",
                    "weight": 1,
                    "is_fallback": True  # Used only when primaries fail
                }
            ],
            tag_list=["tag1", "tag2", "tag3"]
        )
    }
)

Architecture

Core Components

VLMEngine: Main entry point for the package
- Manages model initialization and pipeline execution
- Handles asynchronous processing of videos and images
VLMClient: OpenAI-compatible API client with multiplexer support
- Supports any VLM with chat completions endpoint
- Load balancing across multiple endpoints using multiplexer-llm
- Automatic failover for high availability
- Includes retry logic with exponential backoff and jitter
- Handles image encoding and prompt formatting
Pipeline System: Flexible processing pipeline
- Modular design allows custom processing stages
- Built-in support for preprocessing, analysis, and postprocessing
- Configurable through YAML or Python objects
Model Management: Dynamic model loading
- Supports multiple model types (VLM, preprocessors, postprocessors)
- Lazy loading for efficient resource usage
- Thread-safe model access

Configuration

Basic Configuration

from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig

config = EngineConfig(
    active_ai_models=["my_vlm_model"],
    models={
        "my_vlm_model": ModelConfig(
            type="vlm_model",
            model_id="model-name",
            api_base_url="http://localhost:8000",
            tag_list=["action1", "action2", "action3"],
            max_new_tokens=128,
            request_timeout=70,
            vlm_detected_tag_confidence=0.99
        )
    },
    pipelines={
        "video_pipeline": PipelineConfig(
            inputs=["video_path", "frame_interval"],
            output="results",
            models=[{"name": "my_vlm_model", "inputs": ["frame"], "outputs": "tags"}]
        )
    }
)

Multiplexer Benefits

Load Balancing: Distribute requests across multiple VLM endpoints based on configurable weights
High Availability: Automatic failover to backup endpoints when primary endpoints fail
Improved Performance: Parallel processing across multiple servers for higher throughput
Seamless Integration: Drop-in replacement for single endpoint configurations
Flexible Configuration: Mix of primary and fallback endpoints with custom weights

Advanced Configuration

The package supports complex configurations including:

Multiple models in a pipeline
Custom preprocessing and postprocessing stages
Category-specific settings (thresholds, durations, etc.)
Batch processing configurations

See the examples directory for detailed configuration examples.

For comprehensive multiplexer setup and configuration, see MULTIPLEXER_INTEGRATION.md.

API Reference

VLMEngine

class VLMEngine:
    def __init__(self, config: EngineConfig)
    async def initialize()
    async def process_video(video_path: str, **kwargs) -> Dict[str, Any]

Processing Parameters

video_path: Path to the video file
frame_interval: Seconds between frame samples (default: 0.5)
threshold: Confidence threshold for tag detection (default: 0.5)
return_timestamps: Include timestamp information (default: True)
return_confidence: Include confidence scores (default: True)

Performance Optimization

Memory Requirements

Important: Video preprocessing loads the entire video into system RAM (not GPU memory)
Ensure sufficient RAM for your video sizes (e.g., a 1GB video may require 4-8GB of available RAM)
Consider processing videos in segments for very large files

API Optimization

Configure retry settings based on your VLM server's capacity
Adjust max_new_tokens to balance speed vs accuracy
Use appropriate frame_interval to reduce processing time and API calls

Processing Speed

Increase frame_interval to sample fewer frames (faster but less accurate)
Use batch processing when your VLM endpoint supports it
Consider running multiple VLM instances for parallel processing

Extending the Package

Custom Models

Create custom model classes by inheriting from the base Model class:

from vlm_engine.models import Model

class CustomModel(Model):
    async def process(self, inputs):
        # Your custom processing logic
        return results

Custom Pipelines

Define custom pipelines for specific use cases:

custom_pipeline = PipelineConfig(
    inputs=["image_path"],
    output="analysis",
    models=[
        {"name": "preprocessor", "inputs": ["image_path"], "outputs": "processed_image"},
        {"name": "analyzer", "inputs": ["processed_image"], "outputs": "analysis"}
    ]
)

Troubleshooting

Common Issues

Connection Errors
- Ensure your VLM server is running and accessible
- Check the api_base_url configuration
- Verify firewall settings
GPU Memory Errors
- This package uses CPU by default for preprocessing - GPU memory errors should not occur
- If using GPU-enabled PyTorch, ensure proper CUDA installation
- Check GPU memory availability
Slow Processing
- Increase frame interval for faster processing
- The package uses CPU for preprocessing which is sufficient for VLM API usage
- Optimize VLM server settings

PyTorch Installation Issues

Issue: torch package is very large (~3GB for CUDA version)

Solution: Use CPU-only PyTorch for ~90% smaller install:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

Issue: Need to switch between CPU and GPU versions

Solution: Uninstall first, then reinstall with correct index URL:

pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu  # CPU
# OR
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121  # GPU

Logging

Enable debug logging for troubleshooting:

import logging
logging.basicConfig(level=logging.DEBUG)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

git clone https://github.com/yourusername/vlm-engine.git
cd vlm-engine
pip install -e ".[dev]"

Running Tests

pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of modern Python async patterns
Inspired by production ML serving architectures
Haven's custom VLM models trained using SmolVLM-Finetune - Model Download found on https://havenmodels.orbiter.website/
Designed for integration with OpenAI-compatible VLM endpoints

Support

For issues and feature requests, please use the GitHub issue tracker.

For questions and discussions, join our community:

Discord: Link to Discord

Note: This package requires an OpenAI-compatible VLM endpoint. Options include:

Remote Services

Any OpenAI-compatible API endpoint
Akash deployment - https://github.com/Haven-hvn/haven-inference

Local Setup

LM Studio - Easy local VLM hosting with OpenAI-compatible API

The package does not load VLM models directly - it communicates with external VLM services via API.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

havencto masterpy

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.0.1

Apr 5, 2026

1.0.0

Mar 21, 2026

0.9.93

Mar 21, 2026

0.9.92

Mar 21, 2026

0.9.91

Mar 21, 2026

0.9.9

Mar 21, 2026

0.9.8

Mar 21, 2026

0.9.7

Mar 21, 2026

0.9.6

Mar 21, 2026

0.9.5

Mar 21, 2026

0.9.4

Mar 15, 2026

0.9.3

Feb 22, 2026

This version

0.9.2

Feb 22, 2026

0.9.1

Jan 25, 2026

0.9.0

Jan 25, 2026

0.8.49

Jan 25, 2026

0.8.48

Jan 25, 2026

0.8.47

Jan 25, 2026

0.8.46

Jan 25, 2026

0.8.45

Jan 25, 2026

0.8.44

Jan 25, 2026

0.8.43

Jan 25, 2026

0.8.42

Jan 25, 2026

0.8.41

Jan 24, 2026

0.8.9

Jan 25, 2026

0.8.8

Jan 25, 2026

0.8.7

Jan 25, 2026

0.8.6

Jan 25, 2026

0.8.5

Jan 24, 2026

0.8.4

Jan 24, 2026

0.8.3

Jan 24, 2026

0.8.2

Jan 24, 2026

0.8.1

Jan 24, 2026

0.8.0

Jan 24, 2026

0.7.9

Jan 24, 2026

0.7.7

Jan 23, 2026

0.7.6

Jan 23, 2026

0.7.5

Jan 23, 2026

0.7.4

Jan 20, 2026

0.7.3

Jan 19, 2026

0.7.2

Jan 19, 2026

0.7.1

Jan 19, 2026

0.7.0

Jan 19, 2026

0.6.3

Jan 18, 2026

0.6.2

Jan 18, 2026

0.6.1

Jan 18, 2026

0.6.0

Jan 13, 2026

0.5.9

Jan 3, 2026

0.5.8

Jan 3, 2026

0.5.7

Jan 3, 2026

0.5.6

Jan 3, 2026

0.5.5

Jan 2, 2026

0.5.4

Jan 2, 2026

0.5.3

Jan 2, 2026

0.5.2

Jan 2, 2026

0.5.1

Jan 2, 2026

0.5.0

Jan 2, 2026

0.4.19

Jul 21, 2025

0.4.18

Jul 21, 2025

0.4.17

Jul 21, 2025

0.4.16

Jul 21, 2025

0.4.15

Jul 21, 2025

0.4.14

Jul 21, 2025

0.4.13

Jul 20, 2025

0.4.12

Jul 20, 2025

0.4.11

Jul 20, 2025

0.4.1

Jul 20, 2025

0.4.0

Jul 19, 2025

0.3.9996

Jul 19, 2025

0.3.9995

Jul 19, 2025

0.3.9994

Jul 19, 2025

0.3.9993

Jul 19, 2025

0.3.9992

Jul 18, 2025

0.3.9991

Jul 18, 2025

0.3.999

Jul 18, 2025

0.3.998

Jul 18, 2025

0.3.997

Jul 18, 2025

0.3.996

Jul 18, 2025

0.3.995

Jul 18, 2025

0.3.994

Jul 18, 2025

0.3.993

Jul 18, 2025

0.3.992

Jul 18, 2025

0.3.991

Jul 18, 2025

0.3.99

Jul 18, 2025

0.3.98

Jul 18, 2025

0.3.97

Jul 17, 2025

0.3.96

Jul 17, 2025

0.3.95

Jul 17, 2025

0.3.94

Jul 17, 2025

0.3.93

Jul 17, 2025

0.3.92

Jul 16, 2025

0.3.91

Jul 16, 2025

0.3.9

Jul 16, 2025

0.3.8

Jul 16, 2025

0.3.7

Jul 16, 2025

0.3.6

Jul 15, 2025

0.3.5

Jul 14, 2025

0.3.4

Jul 14, 2025

0.3.3

Jul 14, 2025

0.3.2

Jul 14, 2025

0.3.1

Jul 14, 2025

0.3.0

Jul 13, 2025

0.2.1

Jul 4, 2025

0.2.0

Jul 3, 2025

0.1.1

Jun 10, 2025

0.1.0

Jun 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlm_engine-0.9.2.tar.gz (67.7 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlm_engine-0.9.2-py3-none-any.whl (74.8 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file vlm_engine-0.9.2.tar.gz.

File metadata

Download URL: vlm_engine-0.9.2.tar.gz
Upload date: Feb 22, 2026
Size: 67.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`ee569a7e30098d6de84fad155687e1c9a8c22226c2deb398b349b17086d48c9e`
MD5	`61c5abbd023ab68b8d5467c181d895d5`
BLAKE2b-256	`74f84169a557b504e4dadf795849aa7241e6cef24e84067e60537b56bfd4be78`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.9.2.tar.gz:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vlm_engine-0.9.2.tar.gz
- Subject digest: ee569a7e30098d6de84fad155687e1c9a8c22226c2deb398b349b17086d48c9e
- Sigstore transparency entry: 976718932
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: Haven-hvn/haven-vlm-engine-package@8cc6006d27316ac2395b43ed5b442ef143d65a9f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Haven-hvn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@8cc6006d27316ac2395b43ed5b442ef143d65a9f
- Trigger Event: workflow_dispatch

File details

Details for the file vlm_engine-0.9.2-py3-none-any.whl.

File metadata

Download URL: vlm_engine-0.9.2-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 74.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlm_engine-0.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83c90f58397deeac1ce6d15d06d7266981b7fa46b18d110e37d25d43aa7e582e`
MD5	`b45df2d7d32513f2e8bdef149b3dc15b`
BLAKE2b-256	`540bd4d886dcbc4455c0777f03c623ebb783888e4c0559b9df897ed4e65fc05c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vlm_engine-0.9.2-py3-none-any.whl:

Publisher: publish-pypi.yml on Haven-hvn/haven-vlm-engine-package

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vlm_engine-0.9.2-py3-none-any.whl
- Subject digest: 83c90f58397deeac1ce6d15d06d7266981b7fa46b18d110e37d25d43aa7e582e
- Sigstore transparency entry: 976718933
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: Haven-hvn/haven-vlm-engine-package@8cc6006d27316ac2395b43ed5b442ef143d65a9f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Haven-hvn
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@8cc6006d27316ac2395b43ed5b442ef143d65a9f
- Trigger Event: workflow_dispatch

vlm-engine 0.9.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

VLM Engine

Features

Installation

From PyPI (when published)

CPU-only Installation (Default, Recommended)

GPU Installation (CUDA-enabled)

From Source

CPU-only Installation (Default, Recommended)

GPU Installation (CUDA-enabled)

Installation Notes

Docker Optimization

Requirements

Quick Start

Multiplexer Configuration (Load Balancing)

Architecture

Core Components

Configuration

Basic Configuration

Multiplexer Benefits

Advanced Configuration

API Reference

VLMEngine

Processing Parameters

Performance Optimization

Memory Requirements

API Optimization

Processing Speed

Extending the Package

Custom Models

Custom Pipelines

Troubleshooting

Common Issues

Logging

Contributing

Development Setup

Running Tests

License

Acknowledgments

Support

Remote Services

Local Setup

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance