Connect language models to vision models for natural language visual analysis

These details have not been verified by PyPI

Project links

Project description

🧠 Langvio: Natural Language Computer Vision

Langvio Logo

Connect language models to vision models for natural language visual analysis

🚀 Quick Start • 📖 Documentation • 🎯 Examples • 🔧 Installation • 🤝 Contributing

✨ What is Langvio?

Langvio bridges the gap between human language and computer vision. Ask questions about images and videos in plain English, and get intelligent analysis powered by state-of-the-art vision models and language models.

🎯 Key Features

🗣️ Natural Language Interface: Ask questions like "Count all red cars" or "Find people wearing yellow"
🎥 Multi-Modal Support: Works with both images and videos
🚀 Powered by YOLO: Uses YOLOv11 and YOLOe for fast, accurate object detection
🤖 LLM Integration: Supports OpenAI GPT and Google Gemini for intelligent explanations
📊 Advanced Analytics: Object counting, speed estimation, spatial relationships
🎨 Visual Output: Generates annotated images/videos with detection highlights
🌐 Web Interface: Includes a Flask web app for easy interaction
🔧 Extensible: Easy to add new models and capabilities

🎬 See It In Action

import langvio

# Create a pipeline
pipeline = langvio.create_pipeline()

# Analyze an image
result = pipeline.process(
    query="Count how many people are wearing red shirts",
    media_path="street_scene.jpg"
)

print(result['explanation'])
# Output: "I found 3 people wearing red shirts in the image. 
#          Two are located in the center-left area, and one is on the right side."

# View the annotated result
print(f"Annotated image saved to: {result['output_path']}")

🔧 Installation

Basic Installation

pip install langvio

With LLM Provider Support

Choose your preferred language model provider:

# For OpenAI models (GPT-3.5, GPT-4)
pip install langvio[openai]

# For Google Gemini models
pip install langvio[google]

# For all supported providers
pip install langvio[all-llm]

# For development
pip install langvio[dev]

Environment Setup

Create a .env file for your API keys:

# Copy the template
cp .env.template .env

Add your API keys to .env:

# For OpenAI
OPENAI_API_KEY=your_openai_api_key_here

# For Google Gemini  
GOOGLE_API_KEY=your_google_api_key_here

Langvio automatically loads these environment variables!

🚀 Quick Start

Basic Usage

import langvio

# Create a pipeline (automatically detects available LLM providers)
pipeline = langvio.create_pipeline()

# Process an image
result = pipeline.process(
    query="What objects are in this image?",
    media_path="path/to/your/image.jpg"
)

print(result['explanation'])
print(f"Output: {result['output_path']}")

Video Analysis

# Analyze videos with temporal understanding
result = pipeline.process(
    query="Count vehicles crossing the intersection",
    media_path="traffic_video.mp4"
)

# Get detailed analysis including speed and movement patterns
print(result['explanation'])

Web Interface

# Launch the web interface
cd webapp
python app.py

# Visit http://localhost:5000 in your browser

🎯 Examples

Object Detection & Counting

# Count specific objects
pipeline.process("How many cars are in this parking lot?", "parking.jpg")

# Find objects by attributes  
pipeline.process("Find all red objects in this image", "scene.jpg")

# Spatial relationships
pipeline.process("What objects are on the table?", "kitchen.jpg")

Video Analysis

# Track movement patterns
pipeline.process("Track people walking through the scene", "crowd.mp4")

# Speed analysis
pipeline.process("What's the average speed of vehicles?", "highway.mp4")

# Activity detection
pipeline.process("Detect any unusual activities", "security_footage.mp4")

Advanced Queries

# Complex multi-part analysis
pipeline.process(
    "Count people and vehicles, identify their locations, and note distinctive colors",
    "street_scene.jpg"
)

# Verification tasks
pipeline.process("Is there a dog in this image?", "park_scene.jpg")

# Temporal analysis
pipeline.process("How many people entered vs exited the building?", "entrance.mp4")

🏗️ Architecture

graph TD
    A[User Query] --> B[LLM Processor]
    B --> C[Query Parser]
    C --> D[Vision Processor]
    D --> E[YOLO Detection]
    E --> F[Attribute Analysis]
    F --> G[Spatial Relationships]
    G --> H[Temporal Tracking]
    H --> I[LLM Explanation]
    I --> J[Visualization]
    J --> K[Output]

Core Components

🧠 LLM Processor: Parses queries and generates explanations (OpenAI, Google Gemini)
👁️ Vision Processor: Detects objects and attributes (YOLO, YOLOe)
🎨 Media Processor: Creates visualizations and handles I/O
⚙️ Pipeline: Orchestrates the entire workflow

📊 Supported Models

Vision Models

YOLOv11 (nano, small, medium, large, extra-large)
YOLOe (enhanced YOLO variants)
Automatic model selection based on performance needs

Language Models

OpenAI: GPT-3.5 Turbo, GPT-4 Turbo
Google: Gemini Pro, Gemini Flash
Extensible architecture for adding more providers

🛠️ Configuration

Custom Configuration

# config.yaml
llm:
  default: "gemini"
  models:
    gemini:
      model_name: "gemini-2.0-flash"
      model_kwargs:
        temperature: 0.2

vision:
  default: "yoloe_large"
  models:
    yoloe_large:
      model_path: "yoloe-11l-seg-pf.pt"
      confidence: 0.5

media:
  output_dir: "./results"
  visualization:
    box_color: [0, 255, 0]
    line_thickness: 2

# Use custom configuration
pipeline = langvio.create_pipeline(config_path="config.yaml")

Command Line Interface

# Basic usage
langvio --query "Count the cars" --media image.jpg

# With custom configuration
langvio --query "Find red objects" --media scene.jpg --config custom.yaml

# List available models
langvio --list-models

🌟 Advanced Features

YOLO11 Solutions Integration

Object Counting: Automatic boundary-crossing detection
Speed Estimation: Real-time speed analysis for video
Advanced Tracking: Multi-object tracking across frames

Spatial Relationship Analysis

Positional Understanding: "objects on the table", "cars in the parking lot"
Relative Positioning: left/right, above/below, near/far relationships
Containment Detection: objects inside other objects

Temporal Analysis (Video)

Movement Patterns: Track object trajectories and behaviors
Activity Recognition: Detect activities and interactions
Temporal Relationships: Understand object co-occurrence

Color & Attribute Detection

Advanced Color Recognition: 50+ color categories with confidence scoring
Size Classification: Automatic small/medium/large categorization
Multi-attribute Analysis: Combined color, size, and position analysis

🚀 Performance & Optimization

Model Selection Strategy

# Automatic model selection based on use case
pipeline = langvio.create_pipeline()  # Uses best available model

# Manual model selection for specific needs
pipeline = langvio.create_pipeline(
    vision_name="yoloe_large",  # High accuracy
    llm_name="gpt-4"           # Advanced reasoning
)

Optimization Tips

YOLOe models: Better accuracy for complex scenes
YOLO11 models: Faster processing for real-time applications
Confidence thresholds: Adjust based on precision/recall needs
Frame sampling: Control video processing speed vs accuracy

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Setup

# Clone the repository
git clone https://github.com/yourusername/langvio.git
cd langvio

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black langvio/
isort langvio/

Contributing Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📚 Documentation

📖 Full Documentation: Comprehensive guides and API reference
🎯 Examples: Ready-to-run example scripts
🌐 Web App: Flask web interface for easy testing
⚙️ Configuration: Sample configuration files

🔗 Links & Resources

🐙 GitHub Repository
📦 PyPI Package
📖 Documentation
🐛 Issue Tracker
💬 Discussions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Ultralytics for the amazing YOLO models
LangChain for LLM integration framework
OpenAI and Google for language model APIs
OpenCV for computer vision utilities

⭐ Star us on GitHub if Langvio helps you!

⭐ Star • 🔗 Share • 🐛 Report Bug

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.5

Dec 22, 2025

0.0.4

Dec 18, 2025

This version

0.0.1

Aug 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langvio-0.0.1.tar.gz (87.8 kB view details)

Uploaded Aug 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langvio-0.0.1-py3-none-any.whl (74.4 kB view details)

Uploaded Aug 31, 2025 Python 3

File details

Details for the file langvio-0.0.1.tar.gz.

File metadata

Download URL: langvio-0.0.1.tar.gz
Upload date: Aug 31, 2025
Size: 87.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for langvio-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`a4784c436ae744af8fa4ae7177abb193292b6a8fc0ec6244176b460f6f68eaa8`
MD5	`b8c8519367cabb8929d2de1b805fb83b`
BLAKE2b-256	`58bfd88c2c9f994cfc836ba7ce67ef6e5db107566f97e9d83cce6651f493ee73`

See more details on using hashes here.

File details

Details for the file langvio-0.0.1-py3-none-any.whl.

File metadata

Download URL: langvio-0.0.1-py3-none-any.whl
Upload date: Aug 31, 2025
Size: 74.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for langvio-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c791d6f5c7e33404eb7bd70ccc785a9913a228a24bc8aca06fc52eb472029b5`
MD5	`fb6a86e3446d2ba718f53ca16e3158f2`
BLAKE2b-256	`db8f9431dea15bf1f5d35d02f3d77d3774e73d5e20ce0f6ec733ed6a466f2a93`

See more details on using hashes here.

langvio 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 Langvio: Natural Language Computer Vision

✨ What is Langvio?

🎯 Key Features

🎬 See It In Action

🔧 Installation

Basic Installation

With LLM Provider Support

Environment Setup

🚀 Quick Start

Basic Usage

Video Analysis

Web Interface

🎯 Examples

Object Detection & Counting

Video Analysis

Advanced Queries

🏗️ Architecture

Core Components

📊 Supported Models

Vision Models

Language Models

🛠️ Configuration

Custom Configuration

Command Line Interface

🌟 Advanced Features

YOLO11 Solutions Integration

Spatial Relationship Analysis

Temporal Analysis (Video)

Color & Attribute Detection

🚀 Performance & Optimization

Model Selection Strategy

Optimization Tips

🤝 Contributing

Development Setup

Contributing Guidelines

📚 Documentation

🔗 Links & Resources

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes