Advanced Vision-Language Model Engine for content tagging
Project description
VLM Engine
A high-performance Python package for Vision-Language Model (VLM) based content tagging and analysis. This package provides an advanced implementation for automatic content detection and tagging, delivering superior accuracy compared to traditional image classification methods.
Features
- Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
- Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
- Flexible Architecture: Modular pipeline system with configurable models and processing stages
- Asynchronous Processing: Built on asyncio for efficient video and image processing
- Customizable Tag Sets: Easy configuration of detection categories
- Production Ready: Includes retry logic, error handling, and comprehensive logging
Documentation
- USER_GUIDE.md - Comprehensive configuration guide with detailed parameter descriptions, examples, and best practices
- examples/ - Working code examples for various use cases
- MULTIPLEXER_INTEGRATION.md - Detailed multiplexer setup and configuration
Features
- Remote VLM Integration: Connects to any OpenAI-compatible VLM endpoint (no local model loading required)
- Context-Aware Detection: Leverages Vision-Language Models' understanding of visual relationships for accurate content tagging
- Flexible Architecture: Modular pipeline system with configurable models and processing stages
- Asynchronous Processing: Built on asyncio for efficient video and image processing
- Customizable Tag Sets: Easy configuration of detection categories
- Production Ready: Includes retry logic, error handling, and comprehensive logging
Installation
From PyPI (when published)
pip install vlm-engine
From Source
git clone https://github.com/Haven-hvn/haven-vlm-engine-package.git
cd vlm-engine-package
pip install -e .
Requirements
- Python 3.8+
- Sufficient RAM: Video preprocessing loads entire videos into memory (not GPU memory)
- Compatible VLM server endpoint:
- Remote OpenAI-compatible API (recommended)
- Local server using LM Studio
Quick Start
import asyncio
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig
# Configure the engine
config = EngineConfig(
active_ai_models=["llm_vlm_model"],
models={
"llm_vlm_model": ModelConfig(
type="vlm_model",
model_id="HuggingFaceTB/SmolVLM-Instruct",
api_base_url="http://localhost:7045",
tag_list=["tag1", "tag2", "tag3"] # Your custom tags
)
}
)
# Initialize and use
async def main():
engine = VLMEngine(config)
await engine.initialize()
results = await engine.process_video(
"path/to/video.mp4",
frame_interval=2.0,
threshold=0.5
)
print(f"Detected tags: {results}")
asyncio.run(main())
For more detailed configuration options, parameter descriptions, and best practices, see the USER_GUIDE.md.
Multiplexer Configuration (Load Balancing)
For high-performance deployments, you can configure multiple VLM endpoints with automatic load balancing:
from vlm_engine.config_models import EngineConfig, ModelConfig
config = EngineConfig(
active_ai_models=["vlm_multiplexer_model"],
models={
"vlm_multiplexer_model": ModelConfig(
type="vlm_model",
model_id="HuggingFaceTB/SmolVLM-Instruct",
use_multiplexer=True, # Enable multiplexer mode
multiplexer_endpoints=[
{
"base_url": "http://server1:7045/v1",
"api_key": "",
"name": "primary-server",
"weight": 5, # Higher weight = more requests
"is_fallback": False
},
{
"base_url": "http://server2:7045/v1",
"api_key": "",
"name": "secondary-server",
"weight": 3,
"is_fallback": False
},
{
"base_url": "http://backup:7045/v1",
"api_key": "",
"name": "backup-server",
"weight": 1,
"is_fallback": True # Used only when primaries fail
}
],
tag_list=["tag1", "tag2", "tag3"]
)
}
)
Architecture
Core Components
-
VLMEngine: Main entry point for the package
- Manages model initialization and pipeline execution
- Handles asynchronous processing of videos and images
-
VLMClient: OpenAI-compatible API client with multiplexer support
- Supports any VLM with chat completions endpoint
- Load balancing across multiple endpoints using multiplexer-llm
- Automatic failover for high availability
- Includes retry logic with exponential backoff and jitter
- Handles image encoding and prompt formatting
-
Pipeline System: Flexible processing pipeline
- Modular design allows custom processing stages
- Built-in support for preprocessing, analysis, and postprocessing
- Configurable through YAML or Python objects
-
Model Management: Dynamic model loading
- Supports multiple model types (VLM, preprocessors, postprocessors)
- Lazy loading for efficient resource usage
- Thread-safe model access
For detailed architecture information and component interactions, see USER_GUIDE.md.
Configuration
The VLM Engine uses four main configuration classes:
- EngineConfig - Global engine settings and behavior
- PipelineConfig - Defines processing workflows
- ModelConfig - Configures individual AI models and processors
- PipelineModelConfig - Defines how models integrate into pipelines
For detailed parameter descriptions, configuration examples, and best practices, see USER_GUIDE.md.
Basic Configuration
from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig
config = EngineConfig(
active_ai_models=["my_vlm_model"],
models={
"my_vlm_model": ModelConfig(
type="vlm_model",
model_id="model-name",
api_base_url="http://localhost:8000",
tag_list=["action1", "action2", "action3"],
max_batch_size=5,
instance_count=3,
model_return_confidence=True
)
},
pipelines={
"video_pipeline": PipelineConfig(
inputs=["video_path", "frame_interval"],
output="results",
version=1.0,
models=[
PipelineModelConfig(
name="my_vlm_model",
inputs=["video_path"],
outputs=["results"]
)
]
)
}
)
Multiplexer Configuration
For high-performance deployments with load balancing:
from vlm_engine.config_models import ModelConfig
config = EngineConfig(
active_ai_models=["vlm_multiplexer_model"],
models={
"vlm_multiplexer_model": ModelConfig(
type="vlm_model",
model_id="model-name",
use_multiplexer=True,
multiplexer_endpoints=[
{
"api_base_url": "http://server1:7045/v1",
"model_id": "model-name",
"weight": 5
},
{
"api_base_url": "http://server2:7045/v1",
"model_id": "model-name",
"weight": 3
}
],
tag_list=["tag1", "tag2", "tag3"]
)
}
)
Advanced Configuration
The package supports complex configurations including:
- Multiple models in a pipeline
- Custom preprocessing and postprocessing stages
- Category-specific settings (thresholds, durations, etc.)
- Batch processing configurations
- Category filtering and transformation rules
See the examples directory for detailed configuration examples.
For comprehensive configuration details, parameter descriptions, and best practices, see USER_GUIDE.md.
API Reference
VLMEngine
class VLMEngine:
def __init__(self, config: EngineConfig)
async def initialize()
async def process_video(video_path: str, **kwargs) -> Dict[str, Any]
Processing Parameters
video_path: Path to the video fileframe_interval: Seconds between frame samples (default: 0.5)threshold: Confidence threshold for tag detection (default: 0.5)return_timestamps: Include timestamp information (default: True)return_confidence: Include confidence scores (default: True)
For detailed parameter descriptions and configuration options, see USER_GUIDE.md.
Performance Optimization
Memory Requirements
- Important: Video preprocessing loads the entire video into system RAM (not GPU memory)
- Ensure sufficient RAM for your video sizes (e.g., a 1GB video may require 4-8GB of available RAM)
- Consider processing videos in segments for very large files
API Optimization
- Configure retry settings based on your VLM server's capacity
- Adjust
max_batch_sizeto balance throughput vs memory usage - Use appropriate
frame_intervalto reduce processing time and API calls
Processing Speed
- Increase
frame_intervalto sample fewer frames (faster but less accurate) - Use batch processing when your VLM endpoint supports it
- Consider running multiple VLM instances for parallel processing
For detailed performance tuning guidelines and best practices, see USER_GUIDE.md.
Extending the Package
Custom Models
Create custom model classes by inheriting from the base Model class:
from vlm_engine.models import Model
class CustomModel(Model):
async def process(self, inputs):
# Your custom processing logic
return results
Custom Pipelines
Define custom pipelines for specific use cases:
custom_pipeline = PipelineConfig(
inputs=["image_path"],
output="analysis",
models=[
{"name": "preprocessor", "inputs": ["image_path"], "outputs": "processed_image"},
{"name": "analyzer", "inputs": ["processed_image"], "outputs": "analysis"}
]
)
For detailed information on model types, pipeline design, and best practices, see USER_GUIDE.md.
Troubleshooting
Common Issues
-
"No valid pipelines loaded" Error
- Cause: Configuration is missing required pipeline definitions or models
- Solution: Ensure your
EngineConfigincludes:- At least one pipeline in the
pipelinesdictionary - Models defined in the
modelsdictionary that are referenced by pipelines - Valid
active_ai_modelslist pointing to existing model names
- At least one pipeline in the
- Best Practice: Use the provided
haven_vlm_config.pyas a reference configuration
-
"Cannot import EngineConfig" Error
- Cause: Incorrect import statement
- Solution: Import from the correct module:
from vlm_engine import VLMEngine # Only VLMEngine is exposed from vlm_engine.config_models import EngineConfig # Config classes are in separate module
-
Connection Errors
- Ensure your VLM server is running and accessible
- Check the
api_base_urlconfiguration - Verify firewall settings
-
GPU Memory Errors
- Reduce batch size or frame interval
- Ensure proper CUDA installation
- Check GPU memory availability
-
Slow Processing
- Increase frame interval for faster processing
- Use GPU acceleration if available
- Optimize VLM server settings
Package Import Best Practices
What's exposed to consumers:
- Only
VLMEngineis exported viafrom vlm_engine import * - All configuration classes are in
vlm_engine.config_models - Internal classes (Pipeline, ModelManager, etc.) are not exported
Correct usage pattern:
# ✅ CORRECT - Import what you need
from vlm_engine import VLMEngine
from vlm_engine.config_models import EngineConfig, ModelConfig, PipelineConfig
# ❌ WRONG - Don't try to import internal classes
from vlm_engine import Pipeline # This will fail
from vlm_engine import ModelManager # This will fail
Platform-Specific Notes
macOS Users:
- The package uses PyAV for video processing (no decord required)
- Video preprocessing loads entire videos into system RAM (not GPU memory)
- Ensure sufficient RAM for your video sizes (e.g., 1GB video may require 4-8GB RAM)
Linux/Windows Users:
- Optionally install decord for faster video decoding:
pip install vlm-engine[decord] - PyAV is the default and works on all platforms
For detailed troubleshooting steps and validation checks, see USER_GUIDE.md.
Logging
Enable debug logging for troubleshooting:
import logging
logging.basicConfig(level=logging.DEBUG)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
git clone https://github.com/yourusername/vlm-engine.git
cd vlm-engine
pip install -e ".[dev]"
Running Tests
pytest tests/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
-
Built on top of modern Python async patterns
-
Inspired by production ML serving architectures
-
Haven's custom VLM models trained using SmolVLM-Finetune - Model Download found on https://havenmodels.orbiter.website/
-
Designed for integration with OpenAI-compatible VLM endpoints
Support
For issues and feature requests, please use the GitHub issue tracker.
For questions and discussions, join our community:
- Discord: Link to Discord
Note: This package requires an OpenAI-compatible VLM endpoint. Options include:
Remote Services
- Any OpenAI-compatible API endpoint
- Akash deployment - https://github.com/Haven-hvn/haven-inference
Local Setup
- LM Studio - Easy local VLM hosting with OpenAI-compatible API
The package does not load VLM models directly - it communicates with external VLM services via API.
Documentation
- USER_GUIDE.md - Comprehensive configuration guide with detailed parameter descriptions, examples, and best practices
- examples/ - Working code examples for various use cases
- MULTIPLEXER_INTEGRATION.md - Detailed multiplexer setup and configuration
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vlm_engine-0.8.4.tar.gz.
File metadata
- Download URL: vlm_engine-0.8.4.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5512a74518567561b33c9eb5e6da551fd84d298ff83755445f19c62df9d7936
|
|
| MD5 |
d0723e47d82d05b00b895be6361d27d5
|
|
| BLAKE2b-256 |
bb377b7b0165efc4eb38126d51ba9e00d1308806c5bd5827902b543da457e30e
|
Provenance
The following attestation bundles were made for vlm_engine-0.8.4.tar.gz:
Publisher:
publish-pypi.yml on Haven-hvn/haven-vlm-engine-package
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vlm_engine-0.8.4.tar.gz -
Subject digest:
c5512a74518567561b33c9eb5e6da551fd84d298ff83755445f19c62df9d7936 - Sigstore transparency entry: 850188990
- Sigstore integration time:
-
Permalink:
Haven-hvn/haven-vlm-engine-package@a1b8185b4ba87be5c516e96ec765786bf020423f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Haven-hvn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a1b8185b4ba87be5c516e96ec765786bf020423f -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file vlm_engine-0.8.4-py3-none-any.whl.
File metadata
- Download URL: vlm_engine-0.8.4-py3-none-any.whl
- Upload date:
- Size: 58.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb90ffa88fb01cffe33a20d706a52de424afd6cd0921ed2895547a1942d887cf
|
|
| MD5 |
d8379ce0799da23ed10e005656ada2d8
|
|
| BLAKE2b-256 |
4d9efc9877dc9a8de6b626d1eeec2905910065abb3c7a97420be87b4937117c4
|
Provenance
The following attestation bundles were made for vlm_engine-0.8.4-py3-none-any.whl:
Publisher:
publish-pypi.yml on Haven-hvn/haven-vlm-engine-package
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vlm_engine-0.8.4-py3-none-any.whl -
Subject digest:
fb90ffa88fb01cffe33a20d706a52de424afd6cd0921ed2895547a1942d887cf - Sigstore transparency entry: 850188992
- Sigstore integration time:
-
Permalink:
Haven-hvn/haven-vlm-engine-package@a1b8185b4ba87be5c516e96ec765786bf020423f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Haven-hvn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a1b8185b4ba87be5c516e96ec765786bf020423f -
Trigger Event:
workflow_dispatch
-
Statement type: