Skip to main content

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

Project description

gemini-imagen

PyPI version Python 3.12+ License: MIT CI codecov

A comprehensive Python library and CLI for Google Gemini's image generation and analysis capabilities.

📚 For Python library usage, see LIBRARY.md 🚀 For advanced features, see ADVANCED_USAGE.md 🤝 For contributing, see CONTRIBUTING.md

Features

  • 🎨 Text-to-Image Generation - Create images from text prompts
  • 📐 Aspect Ratio Control - Custom aspect ratios (16:9, 1:1, 9:16, etc.)
  • 🏷️ Labeled Input Images - Reference images by name in prompts
  • 📸 Multiple Output Images - Save same image to multiple locations
  • 💬 Image Analysis - Get detailed text descriptions of images
  • ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
  • 📈 LangSmith Tracing - Full observability for debugging and monitoring
  • 🔒 Safety Settings - Configurable content filtering thresholds
  • 🖥️ CLI Tool - Powerful command-line interface for all operations
  • 🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Basic Installation

pip install gemini-imagen

With S3 Support

pip install gemini-imagen[s3]

From Source

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

Quick Start

CLI Usage

# Set up your API key
export GOOGLE_API_KEY="your-api-key-here"

# Or save it in config
imagen keys set google YOUR_API_KEY

# Generate an image
imagen generate "a serene Japanese garden with cherry blossoms" -o garden.png

# Analyze an image
imagen analyze photo.jpg

# Edit an image
imagen edit "make it sunset" -i original.jpg -o edited.png

# Upload to S3
imagen upload local.png s3://my-bucket/remote.png

Python Library

For detailed Python API documentation, see LIBRARY.md.

Quick example:

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

# Generate an image
result = await generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

CLI Commands

The CLI provides comprehensive image generation and management capabilities:

Command Description Example
generate Generate images from text prompts imagen generate "a cat" -o cat.png
analyze Analyze and describe images imagen analyze image.jpg
edit Edit images using reference images imagen edit "make it brighter" -i photo.jpg -o out.png
upload Upload images to S3 imagen upload local.png s3://bucket/remote.png
download Download images from S3 imagen download s3://bucket/image.png local.png
keys Manage API keys imagen keys set google YOUR_KEY
config Manage configuration imagen config set default_model gemini-2.0-flash-exp
models List and manage models imagen models list

Common CLI Options

# Generate with options
imagen generate "prompt" -o output.png \
  --temperature 0.8 \
  --aspect-ratio 16:9 \
  --safety-setting preset:relaxed \
  --trace \
  --json

# Use input images
imagen generate "blend these styles" \
  -i style.jpg --label "Style:" \
  -i composition.jpg --label "Composition:" \
  -o result.png

# Pipe input
echo "a sunset" | imagen generate -o sunset.png
cat prompt.txt | imagen generate -o output.png

Python Library Examples

For comprehensive Python API documentation, examples, and integration patterns, see LIBRARY.md.

Here are a few quick examples:

Text-to-Image Generation

result = await generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"],
    aspect_ratio="16:9",
    temperature=0.8
)

Image Analysis

result = await generator.generate(
    prompt="Describe this image in detail",
    input_images=["photo.jpg"],
    output_text=True
)
print(result.text)

With Safety Settings

from gemini_imagen import SafetySetting, HarmCategory, HarmBlockThreshold

result = await generator.generate(
    prompt="A tasteful artistic photo",
    output_images=["output.png"],
    safety_settings=[
        SafetySetting(
            category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH
        )
    ]
)

For more examples including S3 integration, LangSmith tracing, batch processing, and web framework integration, see LIBRARY.md.

Configuration

Environment Variables

# Required
export GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
export GV_AWS_ACCESS_KEY_ID=your_aws_access_key
export GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
export LANGSMITH_API_KEY=your_langsmith_api_key
export LANGSMITH_TRACING=true
export LANGSMITH_PROJECT=your-project-name

CLI Configuration

# Set default values
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9
imagen config set safety_settings relaxed

# View configuration
imagen config list

# Configuration location
imagen config path  # Shows: ~/.config/imagen/config.yaml

Configuration Precedence

Values are resolved in order (highest to lowest priority):

  1. Command-line flags
  2. Environment variables
  3. Config file (~/.config/imagen/config.yaml)
  4. Default values

Python API Reference

For complete API documentation with detailed examples, see LIBRARY.md.

Quick reference:

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model (default)
    api_key=None,                         # Auto-loads from GOOGLE_API_KEY env var
    log_images=True                       # Enable LangSmith logging
)

generate() Method

result = await generator.generate(
    prompt: str,                           # Main prompt (required)
    system_prompt: Optional[str] = None,   # System instructions
    input_images: Optional[List] = None,   # Input images
    temperature: Optional[float] = None,   # Sampling temperature (0.0-1.0)
    aspect_ratio: Optional[str] = None,    # e.g., "16:9"
    safety_settings: Optional[List] = None,# Safety filtering
    output_images: Optional[List] = None,  # Generate images
    output_text: bool = False,             # Generate text
    metadata: Optional[Dict] = None,       # LangSmith metadata
    tags: Optional[List] = None            # LangSmith tags
) -> GenerationResult

See LIBRARY.md for full type definitions, parameter details, and usage examples.

Examples

See the examples/ directory for complete working examples:

Documentation

Pricing

Image Generation (gemini-2.5-flash-image)

  • Cost: $30/1M output tokens
  • Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

  • Input: $0.30/1M tokens
  • Output: $1.20/1M tokens

Limitations

  • Multiple images: Gemini may not always generate the exact number requested
  • Structured output: Only available with text model (separate call required)
  • Rate limits (free tier): 10 requests/minute, 1500/day

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Support

  • Issues: GitHub Issues
  • Documentation: This README and linked documentation files
  • Examples: examples/ directory

Made with ❤️ by Aviad Rozenhek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.6.0.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_imagen-0.6.0-py3-none-any.whl (57.0 kB view details)

Uploaded Python 3

File details

Details for the file gemini_imagen-0.6.0.tar.gz.

File metadata

  • Download URL: gemini_imagen-0.6.0.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for gemini_imagen-0.6.0.tar.gz
Algorithm Hash digest
SHA256 d093e0f81a1af9e359a351284fbcfbd167ae7f6f2e4b55f9c1a3a9037b76f3c3
MD5 544536000c8da4ce6cbc6203b73cdaad
BLAKE2b-256 b734506d658d7f78560b57ed244e28a7916cf58c63e112b5e20902da52f02812

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: gemini_imagen-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 57.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for gemini_imagen-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a57d904fb09e96857c9cd778711171df81ff7c111da2d50f635285d49a4b2b1
MD5 8e3e66fe24e80de51d8135bbe5b34742
BLAKE2b-256 2a40d7b5a625537e7d21484a845f644a52ebd9e4b42646436dd1c87645c57dc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page