Skip to main content

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

Project description

gemini-imagen

PyPI version Python 3.12+ License: MIT CI codecov

A comprehensive Python library and CLI for Google Gemini's image generation and analysis capabilities.

📚 For Python library usage, see LIBRARY.md 🚀 For advanced features, see ADVANCED_USAGE.md 🤝 For contributing, see CONTRIBUTING.md

Features

  • 🎨 Text-to-Image Generation - Create images from text prompts
  • 📐 Aspect Ratio Control - Custom aspect ratios (16:9, 1:1, 9:16, etc.)
  • 🏷️ Labeled Input Images - Reference images by name in prompts
  • 📸 Multiple Output Images - Save same image to multiple locations
  • 💬 Image Analysis - Get detailed text descriptions of images
  • ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
  • 📈 LangSmith Tracing - Full observability for debugging and monitoring
  • 🔒 Safety Settings - Configurable content filtering thresholds
  • 🖥️ CLI Tool - Powerful command-line interface for all operations
  • 🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Quick Install (No Python Required)

Install imagen CLI without manually installing Python or managing dependencies:

Linux / macOS:

curl -sSL https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.ps1 | iex

The installer will:

  • Create an isolated environment for gemini-imagen
  • Install all dependencies automatically
  • Add imagen command to your PATH
  • Support self-updates with imagen self-update

Note: Python 3.12+ is still required but the installer handles everything automatically.

Traditional Installation (with pip)

Basic Installation:

pip install gemini-imagen

With S3 Support:

pip install gemini-imagen[s3]

From Source:

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

For detailed installation instructions, see docs/INSTALLATION.md.

Quick Start

CLI Usage

# Set up your API key
export GOOGLE_API_KEY="your-api-key-here"

# Or save it in config
imagen keys set google YOUR_API_KEY

# Generate an image
imagen generate "a serene Japanese garden with cherry blossoms" -o garden.png

# Analyze an image
imagen analyze photo.jpg

# Edit an image
imagen edit "make it sunset" -i original.jpg -o edited.png

# Upload to S3
imagen upload local.png s3://my-bucket/remote.png

Python Library

For detailed Python API documentation, see LIBRARY.md.

Quick example:

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

# Generate an image
result = await generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

CLI Commands

The CLI provides comprehensive image generation and management capabilities:

Command Description Example
generate Generate images from text prompts imagen generate "a cat" -o cat.png
analyze Analyze and describe images imagen analyze image.jpg
edit Edit images using reference images imagen edit "make it brighter" -i photo.jpg -o out.png
upload Upload images to S3 imagen upload local.png s3://bucket/remote.png
download Download images from S3 imagen download s3://bucket/image.png local.png
keys Manage API keys imagen keys set google YOUR_KEY
config Manage configuration imagen config set default_model gemini-2.0-flash-exp
models List and manage models imagen models list
self-update Update to latest version imagen self-update

Common CLI Options

# Generate with options
imagen generate "prompt" -o output.png \
  --temperature 0.8 \
  --aspect-ratio 16:9 \
  --safety-setting preset:relaxed \
  --trace \
  --json

# Use input images
imagen generate "blend these styles" \
  -i style.jpg --label "Style:" \
  -i composition.jpg --label "Composition:" \
  -o result.png

# Pipe input
echo "a sunset" | imagen generate -o sunset.png
cat prompt.txt | imagen generate -o output.png

Python Library Examples

For comprehensive Python API documentation, examples, and integration patterns, see LIBRARY.md.

Here are a few quick examples:

Text-to-Image Generation

result = await generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"],
    aspect_ratio="16:9",
    temperature=0.8
)

Image Analysis

result = await generator.generate(
    prompt="Describe this image in detail",
    input_images=["photo.jpg"],
    output_text=True
)
print(result.text)

With Safety Settings

from gemini_imagen import SafetySetting, HarmCategory, HarmBlockThreshold

result = await generator.generate(
    prompt="A tasteful artistic photo",
    output_images=["output.png"],
    safety_settings=[
        SafetySetting(
            category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH
        )
    ]
)

For more examples including S3 integration, LangSmith tracing, batch processing, and web framework integration, see LIBRARY.md.

Configuration

Environment Variables

# Required
export GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
export GV_AWS_ACCESS_KEY_ID=your_aws_access_key
export GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
export LANGSMITH_API_KEY=your_langsmith_api_key
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=your-project-name

CLI Configuration

# Set default values
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9
imagen config set safety_settings relaxed

# View configuration
imagen config list

# Configuration location
imagen config path  # Shows: ~/.config/imagen/config.yaml

Configuration Precedence

Values are resolved in order (highest to lowest priority):

  1. Command-line flags
  2. Environment variables
  3. Config file (~/.config/imagen/config.yaml)
  4. Default values

Python API Reference

For complete API documentation with detailed examples, see LIBRARY.md.

Quick reference:

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model (default)
    api_key=None,                         # Auto-loads from GOOGLE_API_KEY env var
    log_images=True                       # Enable LangSmith logging
)

generate() Method

result = await generator.generate(
    prompt: str,                           # Main prompt (required)
    system_prompt: Optional[str] = None,   # System instructions
    input_images: Optional[List] = None,   # Input images
    temperature: Optional[float] = None,   # Sampling temperature (0.0-1.0)
    aspect_ratio: Optional[str] = None,    # e.g., "16:9"
    safety_settings: Optional[List] = None,# Safety filtering
    output_images: Optional[List] = None,  # Generate images
    output_text: bool = False,             # Generate text
    metadata: Optional[Dict] = None,       # LangSmith metadata
    tags: Optional[List] = None            # LangSmith tags
) -> GenerationResult

See LIBRARY.md for full type definitions, parameter details, and usage examples.

Examples

See the examples/ directory for complete working examples:

Documentation

Pricing

Image Generation (gemini-2.5-flash-image)

  • Cost: $30/1M output tokens
  • Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

  • Input: $0.30/1M tokens
  • Output: $1.20/1M tokens

Limitations

  • Multiple images: Gemini may not always generate the exact number requested
  • Structured output: Only available with text model (separate call required)
  • Rate limits (free tier): 10 requests/minute, 1500/day

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Support

  • Issues: GitHub Issues
  • Documentation: This README and linked documentation files
  • Examples: examples/ directory

Made with ❤️ by Aviad Rozenhek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.6.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_imagen-0.6.1-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file gemini_imagen-0.6.1.tar.gz.

File metadata

  • Download URL: gemini_imagen-0.6.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for gemini_imagen-0.6.1.tar.gz
Algorithm Hash digest
SHA256 a8b2328e650df36adbd4dd37511848c6b8205fb6fb405664574c770ca38383bf
MD5 ce13fb8deafc31d65cd0fcb696d3c2f2
BLAKE2b-256 206e8414e090e1ff0bfed7734b57cacc2ffb5977d8d4ceaa37adf305b80c1b8f

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: gemini_imagen-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for gemini_imagen-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d72f3247e7a4893a26611a26487e37fef2f106ca0c80b8f8bf64d46880fbd06
MD5 f50f8ae23d9bd91ada02f0b4a2966615
BLAKE2b-256 886cb63bbeb58e242b37fcf83aebdcd9cf5c83fc3b5648936d60a7f0306cef6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page