Skip to main content

Visual segmentation and bounding box detection using Google Gemini AI

Project description

vsegments

Visual segmentation and bounding box detection using Google Gemini AI

vsegments is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.

PyPI version Python Support License: MIT

Features

  • 🎯 Bounding Box Detection: Automatically detect and label objects in images
  • 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
  • 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
  • 🛠️ CLI Tool: Powerful command-line interface for batch processing
  • 📦 Library: Clean Python API for integration into your projects
  • 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
  • ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
  • 📊 JSON Export: Export detection results in structured JSON format

Installation

From PyPI (Recommended)

pip install vsegments

From Source

git clone https://github.com/yourusername/vsegments.git
cd vsegments
pip install -e .

Development Installation

pip install -e ".[dev]"

Quick Start

Prerequisites

You need a Google API key to use this library. Get one from Google AI Studio.

Set your API key as an environment variable:

export GOOGLE_API_KEY="your-api-key-here"

CLI Usage

Basic Bounding Box Detection

vsegments -f image.jpg

Save Output Image

vsegments -f image.jpg -o output.jpg

Perform Segmentation

vsegments -f image.jpg --segment -o segmented.jpg

Custom Prompt

vsegments -f image.jpg -p "Find all people wearing red shirts"

Export JSON Results

vsegments -f image.jpg --json results.json

Add Custom Instructions (Grounding)

vsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"

Use a Different Model

vsegments -f image.jpg -m gemini-2.5-pro

Customize Visualization

vsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5

Library Usage

Basic Detection

from vsegments import VSegments

# Initialize
vs = VSegments(api_key="your-api-key")

# Detect bounding boxes
result = vs.detect_boxes("image.jpg")

# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  - {box.label}")

# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")

Advanced Detection with Custom Settings

from vsegments import VSegments

# Initialize with custom settings
vs = VSegments(
    api_key="your-api-key",
    model="gemini-2.5-pro",
    temperature=0.7,
    max_objects=50
)

# Detect with custom prompt and instructions
result = vs.detect_boxes(
    "image.jpg",
    prompt="Find all vehicles in the image",
    custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)

# Access individual boxes
for box in result.boxes:
    print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")

Segmentation

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Perform segmentation
result = vs.segment("image.jpg")

# Visualize with custom settings
vs.visualize(
    "image.jpg",
    result,
    output_path="segmented.jpg",
    line_width=6,
    font_size=18,
    alpha=0.6
)

Working with Results Programmatically

from vsegments import VSegments
from PIL import Image

vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")

# Load original image
img = Image.open("image.jpg")
width, height = img.size

# Process each detected object
for box in result.boxes:
    # Get absolute coordinates
    abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
    
    # Crop object
    cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
    cropped.save(f"{box.label}.jpg")

CLI Reference

Required Arguments

  • -f, --file IMAGE: Path to input image file

Mode Options

  • --segment: Perform segmentation instead of bounding box detection

API Options

  • --api-key KEY: Google API key (default: GOOGLE_API_KEY env var)
  • -m, --model MODEL: Model name (default: gemini-flash-latest)
  • --temperature TEMP: Sampling temperature 0.0-1.0 (default: 0.5)
  • --max-objects N: Maximum objects to detect (default: 25)

Prompt Options

  • -p, --prompt TEXT: Custom detection prompt
  • --instructions TEXT: Additional system instructions for grounding

Output Options

  • -o, --output FILE: Save visualized output to file
  • --json FILE: Export results as JSON
  • --no-show: Don't display the output image
  • --raw: Print raw API response

Visualization Options

  • --line-width N: Bounding box line width (default: 4)
  • --font-size N: Label font size (default: 14)
  • --alpha A: Mask transparency 0.0-1.0 (default: 0.7)
  • --max-size N: Maximum image dimension for processing (default: 1024)

Other Options

  • -v, --version: Show version information
  • -q, --quiet: Suppress informational output
  • -h, --help: Show help message

API Reference

VSegments Class

Constructor

VSegments(
    api_key: Optional[str] = None,
    model: str = "gemini-flash-latest",
    temperature: float = 0.5,
    max_objects: int = 25
)

Methods

detect_boxes()

Detect bounding boxes in an image.

detect_boxes(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    custom_instructions: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
segment()

Perform segmentation on an image.

segment(
    image_path: Union[str, Path],
    prompt: Optional[str] = None,
    max_size: int = 1024
) -> SegmentationResult
visualize()

Visualize detection/segmentation results.

visualize(
    image_path: Union[str, Path],
    result: SegmentationResult,
    output_path: Optional[Union[str, Path]] = None,
    show: bool = True,
    line_width: int = 4,
    font_size: int = 14,
    alpha: float = 0.7
) -> Image.Image

Data Models

BoundingBox

@dataclass
class BoundingBox:
    label: str
    y1: int  # Normalized 0-1000
    x1: int
    y2: int
    x2: int
    
    def to_absolute(self, img_width: int, img_height: int) -> tuple

SegmentationResult

@dataclass
class SegmentationResult:
    boxes: List[BoundingBox]
    masks: Optional[List[SegmentationMask]] = None
    raw_response: Optional[str] = None

Examples

Batch Processing

import os
from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Process all images in a folder
for filename in os.listdir("images"):
    if filename.endswith((".jpg", ".png")):
        print(f"Processing {filename}...")
        result = vs.detect_boxes(f"images/{filename}")
        vs.visualize(
            f"images/{filename}",
            result,
            output_path=f"output/{filename}",
            show=False
        )

Custom Object Detection

from vsegments import VSegments

vs = VSegments(api_key="your-api-key")

# Detect specific objects
result = vs.detect_boxes(
    "street.jpg",
    prompt="Detect all traffic signs and signals",
    custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)

# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")

Deployment to PyPI

1. Prepare Your Package

Update version in vsegments/__version__.py and ensure all tests pass:

pytest tests/

2. Build Distribution

python -m build

This creates files in dist/:

  • vsegments-0.1.0-py3-none-any.whl (wheel)
  • vsegments-0.1.0.tar.gz (source)

3. Test on TestPyPI (Optional)

python -m twine upload --repository testpypi dist/*

4. Upload to PyPI

python -m twine upload dist/*

5. Verify Installation

pip install vsegments
vsegments --version

Supported Models

  • gemini-flash-latest (default, fastest)
  • gemini-2.0-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro (best quality, slower)

Note: Segmentation features require 2.5 models or later.

Requirements

  • Python 3.8+
  • google-genai >= 1.16.0
  • pillow >= 9.0.0
  • numpy >= 1.20.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

Changelog

See CHANGELOG.md for version history.


Made with ❤️ by Marco Kotrotsos

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vsegments-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vsegments-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file vsegments-0.1.0.tar.gz.

File metadata

  • Download URL: vsegments-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for vsegments-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f0d72a511a61bc8aa67f97c8fb82c1f3854bc02d066067f3a79f72d6486608c
MD5 8e36ea06837d559eadbc1f10c0a798a8
BLAKE2b-256 b9b3acea426fb9dd25c31fcb30240fdb288914e16361ffb9eb857dec0f2b44c4

See more details on using hashes here.

File details

Details for the file vsegments-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vsegments-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for vsegments-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff35cbdf38d7569b9deb63a0572d28b61fd55ed73fdb1cc375c802a480e3079a
MD5 779f76361c0fceca8813a345d615533f
BLAKE2b-256 8f0ba14af13536fbb6ae649eeabb97e30ade8fbc0c7b16369711d7cf4189bfc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page