A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration
Project description
gemini-imagen
A comprehensive Python library and CLI for Google Gemini's image generation and analysis capabilities.
📚 For Python library usage, see LIBRARY.md 🚀 For advanced features, see ADVANCED_USAGE.md 🤝 For contributing, see CONTRIBUTING.md
Features
- 🎨 Text-to-Image Generation - Create images from text prompts
- 📐 Aspect Ratio Control - Custom aspect ratios (16:9, 1:1, 9:16, etc.)
- 🏷️ Labeled Input Images - Reference images by name in prompts
- 📸 Multiple Output Images - Save same image to multiple locations
- 💬 Image Analysis - Get detailed text descriptions of images
- ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
- 📈 LangSmith Tracing - Full observability for debugging and monitoring
- 🔒 Safety Settings - Configurable content filtering thresholds
- 🖥️ CLI Tool - Powerful command-line interface for all operations
- 🔄 Type-Safe - Full type hints with Pydantic validation
Installation
Quick Install (No Python Required)
Install imagen CLI without manually installing Python or managing dependencies:
Linux / macOS:
curl -sSL https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.sh | sh
Windows (PowerShell):
irm https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.ps1 | iex
The installer will:
- Create an isolated environment for gemini-imagen
- Install all dependencies automatically
- Add
imagencommand to your PATH - Support self-updates with
imagen self-update
Note: Python 3.12+ is still required but the installer handles everything automatically.
Traditional Installation (with pip)
Basic Installation:
pip install gemini-imagen
With S3 Support:
pip install gemini-imagen[s3]
From Source:
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"
For detailed installation instructions, see docs/INSTALLATION.md.
Quick Start
CLI Usage
# Set up your API key
export GOOGLE_API_KEY="your-api-key-here"
# Or save it in config
imagen keys set google YOUR_API_KEY
# Generate an image
imagen generate "a serene Japanese garden with cherry blossoms" -o garden.png
# Analyze an image
imagen analyze photo.jpg
# Edit an image
imagen edit "make it sunset" -i original.jpg -o edited.png
# Upload to S3
imagen upload local.png s3://my-bucket/remote.png
Python Library
For detailed Python API documentation, see LIBRARY.md.
Quick example:
from gemini_imagen import GeminiImageGenerator
generator = GeminiImageGenerator()
# Generate an image
result = await generator.generate(
prompt="A serene Japanese garden with cherry blossoms",
output_images=["garden.png"]
)
print(f"Image saved to: {result.image_location}")
CLI Commands
The CLI provides comprehensive image generation and management capabilities:
| Command | Description | Example |
|---|---|---|
generate |
Generate images from text prompts | imagen generate "a cat" -o cat.png |
analyze |
Analyze and describe images | imagen analyze image.jpg |
edit |
Edit images using reference images | imagen edit "make it brighter" -i photo.jpg -o out.png |
upload |
Upload images to S3 | imagen upload local.png s3://bucket/remote.png |
download |
Download images from S3 | imagen download s3://bucket/image.png local.png |
keys |
Manage API keys | imagen keys set google YOUR_KEY |
config |
Manage configuration | imagen config set default_model gemini-2.0-flash-exp |
models |
List and manage models | imagen models list |
self-update |
Update to latest version | imagen self-update |
Common CLI Options
# Generate with options
imagen generate "prompt" -o output.png \
--temperature 0.8 \
--aspect-ratio 16:9 \
--safety-setting preset:relaxed \
--trace \
--json
# Use input images
imagen generate "blend these styles" \
-i style.jpg --label "Style:" \
-i composition.jpg --label "Composition:" \
-o result.png
# Pipe input
echo "a sunset" | imagen generate -o sunset.png
cat prompt.txt | imagen generate -o output.png
Python Library Examples
For comprehensive Python API documentation, examples, and integration patterns, see LIBRARY.md.
Here are a few quick examples:
Text-to-Image Generation
result = await generator.generate(
prompt="A futuristic cityscape at sunset with flying cars",
output_images=["cityscape.png"],
aspect_ratio="16:9",
temperature=0.8
)
Image Analysis
result = await generator.generate(
prompt="Describe this image in detail",
input_images=["photo.jpg"],
output_text=True
)
print(result.text)
With Safety Settings
from gemini_imagen import SafetySetting, HarmCategory, HarmBlockThreshold
result = await generator.generate(
prompt="A tasteful artistic photo",
output_images=["output.png"],
safety_settings=[
SafetySetting(
category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH
)
]
)
For more examples including S3 integration, LangSmith tracing, batch processing, and web framework integration, see LIBRARY.md.
Configuration
Environment Variables
# Required
export GOOGLE_API_KEY=your_google_api_key
# Optional - for S3 features
export GV_AWS_ACCESS_KEY_ID=your_aws_access_key
export GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name
# Optional - for LangSmith tracing
export LANGSMITH_API_KEY=your_langsmith_api_key
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=your-project-name
CLI Configuration
# Set default values
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9
imagen config set safety_settings relaxed
# View configuration
imagen config list
# Configuration location
imagen config path # Shows: ~/.config/imagen/config.yaml
Configuration Precedence
Values are resolved in order (highest to lowest priority):
- Command-line flags
- Environment variables
- Config file (
~/.config/imagen/config.yaml) - Default values
Python API Reference
For complete API documentation with detailed examples, see LIBRARY.md.
Quick reference:
GeminiImageGenerator
generator = GeminiImageGenerator(
model_name="gemini-2.5-flash-image", # Image generation model (default)
api_key=None, # Auto-loads from GOOGLE_API_KEY env var
log_images=True # Enable LangSmith logging
)
generate() Method
result = await generator.generate(
prompt: str, # Main prompt (required)
system_prompt: Optional[str] = None, # System instructions
input_images: Optional[List] = None, # Input images
temperature: Optional[float] = None, # Sampling temperature (0.0-1.0)
aspect_ratio: Optional[str] = None, # e.g., "16:9"
safety_settings: Optional[List] = None,# Safety filtering
output_images: Optional[List] = None, # Generate images
output_text: bool = False, # Generate text
metadata: Optional[Dict] = None, # LangSmith metadata
tags: Optional[List] = None # LangSmith tags
) -> GenerationResult
See LIBRARY.md for full type definitions, parameter details, and usage examples.
Examples
See the examples/ directory for complete working examples:
basic_generation.py- Simple text-to-imageimage_analysis.py- Analyze imageslabeled_inputs.py- Use labeled imagess3_integration.py- S3 upload/downloadlangsmith_tracing.py- Enable tracing
Documentation
- LIBRARY.md - Python library documentation, API reference, integration examples
- ADVANCED_USAGE.md - Advanced features, S3, LangSmith, scripting, automation
- docs/SAFETY_FILTERING.md - Safety filtering configuration and details
- CONTRIBUTING.md - Development setup, testing, contributing guidelines
Pricing
Image Generation (gemini-2.5-flash-image)
- Cost: $30/1M output tokens
- Per Image: ~$0.039 (1290 tokens at 1024x1024)
Text Model (gemini-2.5-flash)
- Input: $0.30/1M tokens
- Output: $1.20/1M tokens
Limitations
- Multiple images: Gemini may not always generate the exact number requested
- Structured output: Only available with text model (separate call required)
- Rate limits (free tier): 10 requests/minute, 1500/day
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Acknowledgments
- Built on
google-genai- Google's unified GenAI SDK - Uses
langsmithfor tracing - S3 integration via
boto3 - Type validation with
pydanticv2 - CLI framework with
click
Support
- Issues: GitHub Issues
- Documentation: This README and linked documentation files
- Examples:
examples/directory
Made with ❤️ by Aviad Rozenhek
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_imagen-0.6.2.tar.gz.
File metadata
- Download URL: gemini_imagen-0.6.2.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b148113fc03b4f84a47718ed1cb9905e4c27626305f3360991916711aeddf67
|
|
| MD5 |
063b2deb20a0c37d33c05be8f8ad0833
|
|
| BLAKE2b-256 |
1babb06880b50fca180ee0398b3dad840011f59c33b399ff3e4ace4043aedb40
|
File details
Details for the file gemini_imagen-0.6.2-py3-none-any.whl.
File metadata
- Download URL: gemini_imagen-0.6.2-py3-none-any.whl
- Upload date:
- Size: 61.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
651a623c2394a1c55f5891d2c21d860918ffad4537bd7a648f75ac11fa39c467
|
|
| MD5 |
b7373cf74afff08b54bbe68be0626b63
|
|
| BLAKE2b-256 |
7e297ede3474aea1ec5da781291215a99685dc7555e6fa331184833e4825fe38
|