Visual segmentation and bounding box detection using Google Gemini AI
Project description
vsegments
Visual segmentation and bounding box detection using Google Gemini AI
vsegments is a powerful Python library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.
Features
- 🎯 Bounding Box Detection: Automatically detect and label objects in images
- 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
- 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
- 🛠️ CLI Tool: Powerful command-line interface for batch processing
- 📦 Library: Clean Python API for integration into your projects
- 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
- ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
- 📊 JSON Export: Export detection results in structured JSON format
Installation
From PyPI (Recommended)
pip install vsegments
From Source
git clone https://github.com/yourusername/vsegments.git
cd vsegments
pip install -e .
Development Installation
pip install -e ".[dev]"
Quick Start
Prerequisites
You need a Google API key to use this library. Get one from Google AI Studio.
Set your API key as an environment variable:
export GOOGLE_API_KEY="your-api-key-here"
CLI Usage
Basic Bounding Box Detection
vsegments -f image.jpg
Save Output Image
vsegments -f image.jpg -o output.jpg
Perform Segmentation
vsegments -f image.jpg --segment -o segmented.jpg
Custom Prompt
vsegments -f image.jpg -p "Find all people wearing red shirts"
Export JSON Results
vsegments -f image.jpg --json results.json
Add Custom Instructions (Grounding)
vsegments -f image.jpg --instructions "Focus only on objects larger than 100 pixels"
Use a Different Model
vsegments -f image.jpg -m gemini-2.5-pro
Customize Visualization
vsegments -f image.jpg --line-width 6 --font-size 16 --alpha 0.5
Library Usage
Basic Detection
from vsegments import VSegments
# Initialize
vs = VSegments(api_key="your-api-key")
# Detect bounding boxes
result = vs.detect_boxes("image.jpg")
# Print results
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
print(f" - {box.label}")
# Visualize
vs.visualize("image.jpg", result, output_path="output.jpg")
Advanced Detection with Custom Settings
from vsegments import VSegments
# Initialize with custom settings
vs = VSegments(
api_key="your-api-key",
model="gemini-2.5-pro",
temperature=0.7,
max_objects=50
)
# Detect with custom prompt and instructions
result = vs.detect_boxes(
"image.jpg",
prompt="Find all vehicles in the image",
custom_instructions="Focus on cars, trucks, and motorcycles. Ignore bicycles."
)
# Access individual boxes
for box in result.boxes:
print(f"{box.label}: [{box.x1}, {box.y1}] -> [{box.x2}, {box.y2}]")
Segmentation
from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Perform segmentation
result = vs.segment("image.jpg")
# Visualize with custom settings
vs.visualize(
"image.jpg",
result,
output_path="segmented.jpg",
line_width=6,
font_size=18,
alpha=0.6
)
Working with Results Programmatically
from vsegments import VSegments
from PIL import Image
vs = VSegments(api_key="your-api-key")
result = vs.detect_boxes("image.jpg")
# Load original image
img = Image.open("image.jpg")
width, height = img.size
# Process each detected object
for box in result.boxes:
# Get absolute coordinates
abs_x1, abs_y1, abs_x2, abs_y2 = box.to_absolute(width, height)
# Crop object
cropped = img.crop((abs_x1, abs_y1, abs_x2, abs_y2))
cropped.save(f"{box.label}.jpg")
CLI Reference
Required Arguments
-f, --file IMAGE: Path to input image file
Mode Options
--segment: Perform segmentation instead of bounding box detection
API Options
--api-key KEY: Google API key (default:GOOGLE_API_KEYenv var)-m, --model MODEL: Model name (default:gemini-flash-latest)--temperature TEMP: Sampling temperature 0.0-1.0 (default: 0.5)--max-objects N: Maximum objects to detect (default: 25)
Prompt Options
-p, --prompt TEXT: Custom detection prompt--instructions TEXT: Additional system instructions for grounding
Output Options
-o, --output FILE: Save visualized output to file--json FILE: Export results as JSON--no-show: Don't display the output image--raw: Print raw API response
Visualization Options
--line-width N: Bounding box line width (default: 4)--font-size N: Label font size (default: 14)--alpha A: Mask transparency 0.0-1.0 (default: 0.7)--max-size N: Maximum image dimension for processing (default: 1024)
Other Options
-v, --version: Show version information-q, --quiet: Suppress informational output-h, --help: Show help message
API Reference
VSegments Class
Constructor
VSegments(
api_key: Optional[str] = None,
model: str = "gemini-flash-latest",
temperature: float = 0.5,
max_objects: int = 25
)
Methods
detect_boxes()
Detect bounding boxes in an image.
detect_boxes(
image_path: Union[str, Path],
prompt: Optional[str] = None,
custom_instructions: Optional[str] = None,
max_size: int = 1024
) -> SegmentationResult
segment()
Perform segmentation on an image.
segment(
image_path: Union[str, Path],
prompt: Optional[str] = None,
max_size: int = 1024
) -> SegmentationResult
visualize()
Visualize detection/segmentation results.
visualize(
image_path: Union[str, Path],
result: SegmentationResult,
output_path: Optional[Union[str, Path]] = None,
show: bool = True,
line_width: int = 4,
font_size: int = 14,
alpha: float = 0.7
) -> Image.Image
Data Models
BoundingBox
@dataclass
class BoundingBox:
label: str
y1: int # Normalized 0-1000
x1: int
y2: int
x2: int
def to_absolute(self, img_width: int, img_height: int) -> tuple
SegmentationResult
@dataclass
class SegmentationResult:
boxes: List[BoundingBox]
masks: Optional[List[SegmentationMask]] = None
raw_response: Optional[str] = None
Examples
Batch Processing
import os
from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Process all images in a folder
for filename in os.listdir("images"):
if filename.endswith((".jpg", ".png")):
print(f"Processing {filename}...")
result = vs.detect_boxes(f"images/{filename}")
vs.visualize(
f"images/{filename}",
result,
output_path=f"output/{filename}",
show=False
)
Custom Object Detection
from vsegments import VSegments
vs = VSegments(api_key="your-api-key")
# Detect specific objects
result = vs.detect_boxes(
"street.jpg",
prompt="Detect all traffic signs and signals",
custom_instructions="Include stop signs, traffic lights, and speed limit signs"
)
# Filter results
traffic_signs = [box for box in result.boxes if "sign" in box.label.lower()]
print(f"Found {len(traffic_signs)} traffic signs")
Deployment to PyPI
1. Prepare Your Package
Update version in vsegments/__version__.py and ensure all tests pass:
pytest tests/
2. Build Distribution
python -m build
This creates files in dist/:
vsegments-0.1.0-py3-none-any.whl(wheel)vsegments-0.1.0.tar.gz(source)
3. Test on TestPyPI (Optional)
python -m twine upload --repository testpypi dist/*
4. Upload to PyPI
python -m twine upload dist/*
5. Verify Installation
pip install vsegments
vsegments --version
Supported Models
gemini-flash-latest(default, fastest)gemini-2.0-flashgemini-2.5-flash-litegemini-2.5-flashgemini-2.5-pro(best quality, slower)
Note: Segmentation features require 2.5 models or later.
Requirements
- Python 3.8+
- google-genai >= 1.16.0
- pillow >= 9.0.0
- numpy >= 1.20.0
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built using Google Gemini AI
- Inspired by the Google AI Cookbook
Support
- Issues: GitHub Issues
- Documentation: GitHub README
Changelog
See CHANGELOG.md for version history.
Made with ❤️ by Marco Kotrotsos
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vsegments-0.1.0.tar.gz.
File metadata
- Download URL: vsegments-0.1.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f0d72a511a61bc8aa67f97c8fb82c1f3854bc02d066067f3a79f72d6486608c
|
|
| MD5 |
8e36ea06837d559eadbc1f10c0a798a8
|
|
| BLAKE2b-256 |
b9b3acea426fb9dd25c31fcb30240fdb288914e16361ffb9eb857dec0f2b44c4
|
File details
Details for the file vsegments-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vsegments-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff35cbdf38d7569b9deb63a0572d28b61fd55ed73fdb1cc375c802a480e3079a
|
|
| MD5 |
779f76361c0fceca8813a345d615533f
|
|
| BLAKE2b-256 |
8f0ba14af13536fbb6ae649eeabb97e30ade8fbc0c7b16369711d7cf4189bfc5
|