Skip to main content

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

Project description

gemini-imagen

PyPI version Python 3.12+ License: MIT CI codecov

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities, featuring:

  • 🎨 Text-to-Image Generation - Create images from text prompts
  • 🏷️ Labeled Input Images - Reference images by name in prompts for better control
  • 📸 Multiple Output Images - Generate multiple variations in one request
  • 💬 Image Analysis - Get detailed text descriptions of images
  • ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
  • 📈 LangSmith Tracing - Full observability for debugging and monitoring
  • 🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Basic Installation

Using pip:

pip install gemini-imagen

Using uv (recommended - faster):

uv pip install gemini-imagen

With S3 Support

Using pip:

pip install gemini-imagen[s3]

Using uv:

uv pip install gemini-imagen[s3]

From Source

Using uv (recommended):

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
uv sync --all-extras

Or using pip:

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

Quick Start

1. Set Up API Key

export GOOGLE_API_KEY="your-api-key-here"

Or create a .env file:

GOOGLE_API_KEY=your-api-key-here

2. Generate Your First Image

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

result = generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

Features

Text-to-Image Generation

Generate images from text descriptions:

result = generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"]
)

Image Analysis

Analyze existing images and get text descriptions:

result = generator.generate(
    prompt="Describe this image in detail, including colors, objects, and mood",
    input_images=["photo.jpg"],
    output_text=True
)

print(result.text)

Labeled Input Images

Reference multiple images by name in your prompts:

result = generator.generate(
    prompt="Blend the artistic style from Photo A with the composition from Photo B",
    input_images=[
        ("Photo A (style):", "style_reference.jpg"),
        ("Photo B (composition):", "composition_reference.jpg")
    ],
    output_images=["blended_result.png"]
)

Multiple Output Images

Request multiple variations:

result = generator.generate(
    prompt="Create 3 variations of a mountain landscape",
    output_images=[
        ("Sunrise version", "mountain_sunrise.png"),
        ("Sunset version", "mountain_sunset.png"),
        ("Night version", "mountain_night.png")
    ]
)

# Note: Gemini may return fewer images than requested
for label, uri in zip(result.image_labels, result.image_locations):
    print(f"{label}: {uri}")

S3 Integration

Upload/download images directly to/from AWS S3:

# Configure AWS credentials in .env:
# GV_AWS_ACCESS_KEY_ID=your_key
# GV_AWS_SECRET_ACCESS_KEY=your_secret
# GV_AWS_STORAGE_BUCKET_NAME=your_bucket

result = generator.generate(
    prompt="A magical forest scene",
    input_images=["s3://my-bucket/reference.jpg"],
    output_images=["s3://my-bucket/output.png"]
)

# Access S3 URLs
print(result.image_s3_uri)    # s3://my-bucket/output.png
print(result.image_http_url)  # https://my-bucket.s3.region.amazonaws.com/...

LangSmith Tracing

Enable observability with LangSmith:

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"

generator = GeminiImageGenerator(log_images=True)

result = generator.generate(
    prompt="A robot reading in a cozy library",
    output_images=["robot_library.png"],
    metadata={"user_id": "demo", "session": "example"},
    tags=["demo", "robot"]
)

# View traces at https://smith.langchain.com/

Image + Text Output

Get both an image and explanation:

result = generator.generate(
    prompt="Generate a futuristic city and explain its key architectural features",
    output_images=["city.png"],
    output_text=True
)

print(f"Image: {result.image_location}")
print(f"Explanation: {result.text}")

Architecture

The package uses gemini-2.5-flash-image for all operations:

graph TB
    A[User Request] --> B[Load Input Images<br/>with Labels]
    B --> C[Build Content<br/>Prompt + Images]
    C --> D[gemini-2.5-flash-image<br/>Generate Content]
    D --> E{Extract Response}
    E -->|Has Images| F[PIL Images]
    E -->|Has Text| G[Plain Text]
    F --> H{Save to S3/Local?}
    G --> I[Return Result]
    H -->|Yes| J[Upload & Get URLs]
    H -->|No| I
    J --> I
    I --> K{LangSmith<br/>Enabled?}
    K -->|Yes| L[Log to LangSmith<br/>- Images as S3 URLs<br/>- Text response]
    K -->|No| M[GenerationResult]
    L --> M

API Reference

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model
    api_key=None,                          # Auto-loads from env
    log_images=True                        # Enable LangSmith logging
)

generate() Method

result = generator.generate(
    prompt: str,                                      # Main prompt (required)
    system_prompt: Optional[str] = None,              # System instructions
    input_images: Optional[List[ImageSource]] = None, # Input images
    temperature: Optional[float] = None,              # Sampling temperature

    # Output configuration
    output_images: Optional[List[OutputImageSpec]] = None,  # Generate images
    output_text: bool = False,                              # Generate text

    # LangSmith
    metadata: Optional[Dict[str, str]] = None,
    tags: Optional[List[str]] = None
) -> GenerationResult

Type Definitions:

  • ImageSource = RawImageSource | LabeledImage

    • RawImageSource = Image.Image | str | Path
    • LabeledImage = Tuple[str, RawImageSource]
  • OutputImageSpec = OutputLocation | LabeledOutput

    • OutputLocation = str | Path
    • LabeledOutput = Tuple[str, OutputLocation]

GenerationResult

class GenerationResult:
    text: Optional[str]                      # Generated text
    images: List[Image.Image]                # PIL Image objects
    image_labels: List[Optional[str]]        # Image labels
    image_locations: List[str]               # Local file paths
    image_s3_uris: List[Optional[str]]       # S3 URIs
    image_http_urls: List[Optional[str]]     # HTTP URLs

    # Convenience properties (first image)
    @property
    def image(self) -> Optional[Image.Image]
    @property
    def image_location(self) -> Optional[str]
    @property
    def image_s3_uri(self) -> Optional[str]
    @property
    def image_http_url(self) -> Optional[str]

Structured Output

⚠️ The image model (gemini-2.5-flash-image) does not support JSON schemas or structured output.

For structured output, use a two-step approach:

# Step 1: Generate or analyze image
from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()
result = generator.generate(
    prompt="Analyze this image in detail",
    input_images=["image.png"],
    output_text=True
)

# Step 2: Get structured output with gemini-2.5-flash
from google import generativeai as genai
from pydantic import BaseModel

class ImageAnalysis(BaseModel):
    objects: list[str]
    colors: list[str]
    mood: str

text_model = genai.GenerativeModel("gemini-2.5-flash")
response = text_model.generate_content(
    f"{result.text}\\n\\nFormat as JSON with fields: objects, colors, mood",
    generation_config={
        "response_mime_type": "application/json",
        "response_schema": ImageAnalysis.model_json_schema()
    }
)

analysis = ImageAnalysis.model_validate_json(response.text)

Configuration

Environment Variables

# Required
GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
GV_AWS_ACCESS_KEY_ID=your_aws_access_key
GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_TRACING=true

Examples

See the examples/ directory for complete working examples:

Pricing

Image Generation (gemini-2.5-flash-image)

  • Cost: $30/1M output tokens
  • Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

  • Input: $0.30/1M tokens
  • Output: $1.20/1M tokens

Limitations

  • Multiple images: Gemini may not always generate the exact number requested
  • Structured output: Only available with text model (separate call required)
  • Rate limits (free tier): 10 requests/minute, 1500/day

Development

Setup Development Environment

Using uv (recommended):

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Lock dependencies and sync (installs everything)
uv lock
uv sync --all-extras

# Install pre-commit hooks
uv run pre-commit install

Or using pip:

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Install with development dependencies
pip install -e ".[dev,s3]"

# Install pre-commit hooks
pre-commit install

Running Tests

Using uv:

# Run unit tests only (no API keys required)
uv run pytest tests/ -v -m "not integration"

# Run all tests including integration (requires API keys)
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ -v -m "not integration" --cov=gemini_imagen --cov-report=html

# Run specific test file
uv run pytest tests/test_gemini_image_wrapper.py -v

Using make (with uv):

make test    # Runs: uv run pytest (unit tests only)

Test Categories:

  • Unit tests: Mocked tests, no API keys required
  • Integration tests: Require real API keys (-m integration)
    • GOOGLE_API_KEY - for Gemini API tests
    • GV_AWS_* - for S3 integration tests
    • LANGSMITH_API_KEY - for LangSmith tracing tests

Integration tests are automatically skipped if credentials are missing.

Code Quality

# Run linter
make lint

# Format code
make format

# Run pre-commit hooks
make pre-commit

Building and Publishing

# Build package
make build

# Publish to PyPI (requires credentials)
make publish

CI/CD

This project uses GitHub Actions for continuous integration:

  • CI Pipeline: Runs on every push and pull request

    • Linting with ruff
    • Type checking with mypy
    • Tests on Python 3.12 and 3.13
    • Code coverage reporting
  • Release Pipeline: Automatically publishes to PyPI on version tags

    • Triggered by pushing tags like v1.0.0
    • Creates GitHub releases with artifacts
  • Dependabot: Automatically updates dependencies weekly

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Support


Made with ❤️ by Aviad Rozenhek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.1.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_imagen-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file gemini_imagen-0.1.0.tar.gz.

File metadata

  • Download URL: gemini_imagen-0.1.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.18

File hashes

Hashes for gemini_imagen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 696f5947cf81b36e3af8b0bea6bf4cd0508d3c696b67027aa5e814a635d7c45e
MD5 0767b70e8fd126d770d01273d5775a3b
BLAKE2b-256 2ad410251bb768c0fd61730f072b4bf140a66f4abb068b8f63cd820332df9b89

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gemini_imagen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ccb12f1596a04a8b92284a1d1382d2abd1a0d9e58944a222170c9174f29cbe1
MD5 5fa4519ff7582eec172486fba40ecc14
BLAKE2b-256 db407157f9137d9d8479e20bd385f4f5783baae19768cbcb47e7fabba9ca34f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page