Skip to main content

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

Project description

gemini-imagen

PyPI version Python 3.12+ License: MIT CI codecov

A comprehensive Python library and CLI for Google Gemini's image generation and analysis capabilities.

📚 For Python library usage, see LIBRARY.md 🚀 For advanced features, see ADVANCED_USAGE.md 🤝 For contributing, see CONTRIBUTING.md

Features

  • 🎨 Text-to-Image Generation - Create images from text prompts
  • 📐 Aspect Ratio Control - Custom aspect ratios (16:9, 1:1, 9:16, etc.)
  • 🏷️ Labeled Input Images - Reference images by name in prompts
  • 📸 Multiple Output Images - Save same image to multiple locations
  • 💬 Image Analysis - Get detailed text descriptions of images
  • ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
  • 📈 LangSmith Tracing - Full observability for debugging and monitoring
  • 🔒 Safety Settings - Configurable content filtering thresholds
  • 🖥️ CLI Tool - Powerful command-line interface for all operations
  • 🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Quick Install (No Python Required)

Install imagen CLI without manually installing Python or managing dependencies:

Linux / macOS:

curl -sSL https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/aviadr1/gemini-imagen/main/scripts/install.ps1 | iex

The installer will:

  • Create an isolated environment for gemini-imagen
  • Install all dependencies automatically
  • Add imagen command to your PATH
  • Support self-updates with imagen self-update

Note: Python 3.12+ is still required but the installer handles everything automatically.

Traditional Installation (with pip)

Basic Installation:

pip install gemini-imagen

With S3 Support:

pip install gemini-imagen[s3]

From Source:

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

For detailed installation instructions, see docs/INSTALLATION.md.

Quick Start

CLI Usage

# Set up your API key
export GOOGLE_API_KEY="your-api-key-here"

# Or save it in config
imagen keys set google YOUR_API_KEY

# Generate an image
imagen generate "a serene Japanese garden with cherry blossoms" -o garden.png

# Analyze an image
imagen analyze photo.jpg

# Edit an image
imagen edit "make it sunset" -i original.jpg -o edited.png

# Upload to S3
imagen upload local.png s3://my-bucket/remote.png

Python Library

For detailed Python API documentation, see LIBRARY.md.

Quick example:

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

# Generate an image
result = await generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

CLI Commands

The CLI provides comprehensive image generation and management capabilities:

Command Description Example
generate Generate images from text prompts imagen generate "a cat" -o cat.png
analyze Analyze and describe images imagen analyze image.jpg
edit Edit images using reference images imagen edit "make it brighter" -i photo.jpg -o out.png
upload Upload images to S3 imagen upload local.png s3://bucket/remote.png
download Download images from S3 imagen download s3://bucket/image.png local.png
keys Manage API keys imagen keys set google YOUR_KEY
config Manage configuration imagen config set default_model gemini-2.0-flash-exp
models List and manage models imagen models list
self-update Update to latest version imagen self-update

Common CLI Options

# Generate with options
imagen generate "prompt" -o output.png \
  --temperature 0.8 \
  --aspect-ratio 16:9 \
  --safety-setting preset:relaxed \
  --trace \
  --json

# Use input images
imagen generate "blend these styles" \
  -i style.jpg --label "Style:" \
  -i composition.jpg --label "Composition:" \
  -o result.png

# Pipe input
echo "a sunset" | imagen generate -o sunset.png
cat prompt.txt | imagen generate -o output.png

Python Library Examples

For comprehensive Python API documentation, examples, and integration patterns, see LIBRARY.md.

Here are a few quick examples:

Text-to-Image Generation

result = await generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"],
    aspect_ratio="16:9",
    temperature=0.8
)

Image Analysis

result = await generator.generate(
    prompt="Describe this image in detail",
    input_images=["photo.jpg"],
    output_text=True
)
print(result.text)

With Safety Settings

from gemini_imagen import SafetySetting, HarmCategory, HarmBlockThreshold

result = await generator.generate(
    prompt="A tasteful artistic photo",
    output_images=["output.png"],
    safety_settings=[
        SafetySetting(
            category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH
        )
    ]
)

For more examples including S3 integration, LangSmith tracing, batch processing, and web framework integration, see LIBRARY.md.

Configuration

Environment Variables

# Required
export GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
export GV_AWS_ACCESS_KEY_ID=your_aws_access_key
export GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
export LANGSMITH_API_KEY=your_langsmith_api_key
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=your-project-name

CLI Configuration

# Set default values
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9
imagen config set safety_settings relaxed

# View configuration
imagen config list

# Configuration location
imagen config path  # Shows: ~/.config/imagen/config.yaml

Configuration Precedence

Values are resolved in order (highest to lowest priority):

  1. Command-line flags
  2. Environment variables
  3. Config file (~/.config/imagen/config.yaml)
  4. Default values

Python API Reference

For complete API documentation with detailed examples, see LIBRARY.md.

Quick reference:

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model (default)
    api_key=None,                         # Auto-loads from GOOGLE_API_KEY env var
    log_images=True                       # Enable LangSmith logging
)

generate() Method

result = await generator.generate(
    prompt: str,                           # Main prompt (required)
    system_prompt: Optional[str] = None,   # System instructions
    input_images: Optional[List] = None,   # Input images
    temperature: Optional[float] = None,   # Sampling temperature (0.0-1.0)
    aspect_ratio: Optional[str] = None,    # e.g., "16:9"
    safety_settings: Optional[List] = None,# Safety filtering
    output_images: Optional[List] = None,  # Generate images
    output_text: bool = False,             # Generate text
    metadata: Optional[Dict] = None,       # LangSmith metadata
    tags: Optional[List] = None            # LangSmith tags
) -> GenerationResult

See LIBRARY.md for full type definitions, parameter details, and usage examples.

Examples

See the examples/ directory for complete working examples:

Documentation

Pricing

Image Generation (gemini-2.5-flash-image)

  • Cost: $30/1M output tokens
  • Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

  • Input: $0.30/1M tokens
  • Output: $1.20/1M tokens

Limitations

  • Multiple images: Gemini may not always generate the exact number requested
  • Structured output: Only available with text model (separate call required)
  • Rate limits (free tier): 10 requests/minute, 1500/day

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Support

  • Issues: GitHub Issues
  • Documentation: This README and linked documentation files
  • Examples: examples/ directory

Made with ❤️ by Aviad Rozenhek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.6.4.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_imagen-0.6.4-py3-none-any.whl (61.5 kB view details)

Uploaded Python 3

File details

Details for the file gemini_imagen-0.6.4.tar.gz.

File metadata

  • Download URL: gemini_imagen-0.6.4.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for gemini_imagen-0.6.4.tar.gz
Algorithm Hash digest
SHA256 011df4b1d5a5aac26307484120cff6725f37d7659e62dd36a66ad19a2666ffcf
MD5 eb18f3bae933da7fcac47ac0ca460088
BLAKE2b-256 659243b976a76b01605b4363ea69026182271c20a013f2622aecfae810a03bfb

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: gemini_imagen-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 61.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for gemini_imagen-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 64b8b4f37e141f9ba8c5904892bb747361ad464452c5116df7e86eb594743f18
MD5 5778b2fec25ae1030b269d5eefc616e0
BLAKE2b-256 65cf3e25c95f7769904bd411c442c3803871bcf4b652000831bb6c2574a376d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page