Skip to main content

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

Project description

gemini-imagen

PyPI version Python 3.12+ License: MIT CI codecov

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities, featuring:

  • 🎨 Text-to-Image Generation - Create images from text prompts
  • 🏷️ Labeled Input Images - Reference images by name in prompts for better control
  • 📸 Multiple Output Images - Generate multiple variations in one request
  • 💬 Image Analysis - Get detailed text descriptions of images
  • ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
  • 📈 LangSmith Tracing - Full observability for debugging and monitoring
  • 🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Basic Installation

Using pip:

pip install gemini-imagen

Using uv (recommended - faster):

uv pip install gemini-imagen

With S3 Support

Using pip:

pip install gemini-imagen[s3]

Using uv:

uv pip install gemini-imagen[s3]

From Source

Using uv (recommended):

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
uv sync --all-extras

Or using pip:

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

Quick Start

1. Set Up API Key

export GOOGLE_API_KEY="your-api-key-here"

Or create a .env file:

GOOGLE_API_KEY=your-api-key-here

2. Generate Your First Image

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

result = generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

Features

Text-to-Image Generation

Generate images from text descriptions:

result = generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"]
)

Image Analysis

Analyze existing images and get text descriptions:

result = generator.generate(
    prompt="Describe this image in detail, including colors, objects, and mood",
    input_images=["photo.jpg"],
    output_text=True
)

print(result.text)

Labeled Input Images

Reference multiple images by name in your prompts:

result = generator.generate(
    prompt="Blend the artistic style from Photo A with the composition from Photo B",
    input_images=[
        ("Photo A (style):", "style_reference.jpg"),
        ("Photo B (composition):", "composition_reference.jpg")
    ],
    output_images=["blended_result.png"]
)

Multiple Output Images

Request multiple variations:

result = generator.generate(
    prompt="Create 3 variations of a mountain landscape",
    output_images=[
        ("Sunrise version", "mountain_sunrise.png"),
        ("Sunset version", "mountain_sunset.png"),
        ("Night version", "mountain_night.png")
    ]
)

# Note: Gemini may return fewer images than requested
for label, uri in zip(result.image_labels, result.image_locations):
    print(f"{label}: {uri}")

S3 Integration

Upload/download images directly to/from AWS S3:

# Configure AWS credentials in .env:
# GV_AWS_ACCESS_KEY_ID=your_key
# GV_AWS_SECRET_ACCESS_KEY=your_secret
# GV_AWS_STORAGE_BUCKET_NAME=your_bucket

result = generator.generate(
    prompt="A magical forest scene",
    input_images=["s3://my-bucket/reference.jpg"],
    output_images=["s3://my-bucket/output.png"]
)

# Access S3 URLs
print(result.image_s3_uri)    # s3://my-bucket/output.png
print(result.image_http_url)  # https://my-bucket.s3.region.amazonaws.com/...

LangSmith Tracing

Enable observability with LangSmith:

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"

generator = GeminiImageGenerator(log_images=True)

result = generator.generate(
    prompt="A robot reading in a cozy library",
    output_images=["robot_library.png"],
    metadata={"user_id": "demo", "session": "example"},
    tags=["demo", "robot"]
)

# View traces at https://smith.langchain.com/

Image + Text Output

Get both an image and explanation:

result = generator.generate(
    prompt="Generate a futuristic city and explain its key architectural features",
    output_images=["city.png"],
    output_text=True
)

print(f"Image: {result.image_location}")
print(f"Explanation: {result.text}")

Architecture

The package uses gemini-2.5-flash-image for all operations:

graph TB
    A[User Request] --> B[Load Input Images<br/>with Labels]
    B --> C[Build Content<br/>Prompt + Images]
    C --> D[gemini-2.5-flash-image<br/>Generate Content]
    D --> E{Extract Response}
    E -->|Has Images| F[PIL Images]
    E -->|Has Text| G[Plain Text]
    F --> H{Save to S3/Local?}
    G --> I[Return Result]
    H -->|Yes| J[Upload & Get URLs]
    H -->|No| I
    J --> I
    I --> K{LangSmith<br/>Enabled?}
    K -->|Yes| L[Log to LangSmith<br/>- Images as S3 URLs<br/>- Text response]
    K -->|No| M[GenerationResult]
    L --> M

API Reference

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model
    api_key=None,                          # Auto-loads from env
    log_images=True                        # Enable LangSmith logging
)

generate() Method

result = generator.generate(
    prompt: str,                                      # Main prompt (required)
    system_prompt: Optional[str] = None,              # System instructions
    input_images: Optional[List[ImageSource]] = None, # Input images
    temperature: Optional[float] = None,              # Sampling temperature

    # Output configuration
    output_images: Optional[List[OutputImageSpec]] = None,  # Generate images
    output_text: bool = False,                              # Generate text

    # LangSmith
    metadata: Optional[Dict[str, str]] = None,
    tags: Optional[List[str]] = None
) -> GenerationResult

Type Definitions:

  • ImageSource = RawImageSource | LabeledImage

    • RawImageSource = Image.Image | str | Path
    • LabeledImage = Tuple[str, RawImageSource]
  • OutputImageSpec = OutputLocation | LabeledOutput

    • OutputLocation = str | Path
    • LabeledOutput = Tuple[str, OutputLocation]

GenerationResult

class GenerationResult:
    text: Optional[str]                      # Generated text
    images: List[Image.Image]                # PIL Image objects
    image_labels: List[Optional[str]]        # Image labels
    image_locations: List[str]               # Local file paths
    image_s3_uris: List[Optional[str]]       # S3 URIs
    image_http_urls: List[Optional[str]]     # HTTP URLs

    # Convenience properties (first image)
    @property
    def image(self) -> Optional[Image.Image]
    @property
    def image_location(self) -> Optional[str]
    @property
    def image_s3_uri(self) -> Optional[str]
    @property
    def image_http_url(self) -> Optional[str]

Structured Output

⚠️ The image model (gemini-2.5-flash-image) does not support JSON schemas or structured output.

For structured output, use a two-step approach:

# Step 1: Generate or analyze image
from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()
result = generator.generate(
    prompt="Analyze this image in detail",
    input_images=["image.png"],
    output_text=True
)

# Step 2: Get structured output with gemini-2.5-flash
from google import genai
from google.genai import types
from pydantic import BaseModel

class ImageAnalysis(BaseModel):
    objects: list[str]
    colors: list[str]
    mood: str

client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=f"{result.text}\\n\\nFormat as JSON with fields: objects, colors, mood",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=ImageAnalysis.model_json_schema()
    )
)

analysis = ImageAnalysis.model_validate_json(response.text)

Configuration

Environment Variables

# Required
GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
GV_AWS_ACCESS_KEY_ID=your_aws_access_key
GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_TRACING=true

Examples

See the examples/ directory for complete working examples:

Pricing

Image Generation (gemini-2.5-flash-image)

  • Cost: $30/1M output tokens
  • Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

  • Input: $0.30/1M tokens
  • Output: $1.20/1M tokens

Limitations

  • Multiple images: Gemini may not always generate the exact number requested
  • Structured output: Only available with text model (separate call required)
  • Rate limits (free tier): 10 requests/minute, 1500/day

Development

Setup Development Environment

Using uv (recommended):

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Lock dependencies and sync (installs everything)
uv lock
uv sync --all-extras

# Install pre-commit hooks
uv run pre-commit install

Or using pip:

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Install with development dependencies
pip install -e ".[dev,s3]"

# Install pre-commit hooks
pre-commit install

Running Tests

Using uv:

# Run unit tests only (no API keys required)
uv run pytest tests/ -v -m "not integration"

# Run all tests including integration (requires API keys)
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ -v -m "not integration" --cov=gemini_imagen --cov-report=html

# Run specific test file
uv run pytest tests/test_gemini_image_wrapper.py -v

Using make (with uv):

make test    # Runs: uv run pytest (unit tests only)

Test Categories:

  • Unit tests: Mocked tests, no API keys required
  • Integration tests: Require real API keys (-m integration)
    • GOOGLE_API_KEY - for Gemini API tests
    • GV_AWS_* - for S3 integration tests
    • LANGSMITH_API_KEY - for LangSmith tracing tests

Integration tests are automatically skipped if credentials are missing.

Code Quality

# Run linter
make lint

# Format code
make format

# Run pre-commit hooks
make pre-commit

Building and Publishing

Quick Release Process

One command to release:

# Patch release (0.1.0 -> 0.1.1) - default
./scripts/release.sh

# Minor release (0.1.0 -> 0.2.0)
./scripts/release.sh minor

# Major release (0.1.0 -> 1.0.0)
./scripts/release.sh major

# Test on TestPyPI first
./scripts/release.sh patch --test

The release script automatically:

  1. Bumps the version (patch/minor/major)
  2. Commits the version change
  3. Creates and pushes a git tag
  4. Installs dependencies
  5. Runs linters (ruff + mypy)
  6. Runs tests
  7. Builds the package
  8. Verifies with twine
  9. Uploads to PyPI (with confirmation)

Manual version bump (if needed):

# Bump version manually
uv run python scripts/bump_version.py patch  # 0.1.0 -> 0.1.1
uv run python scripts/bump_version.py minor  # 0.1.0 -> 0.2.0
uv run python scripts/bump_version.py major  # 0.1.0 -> 1.0.0

Manual Build/Publish

# Build package
make build

# Publish to PyPI (requires credentials)
make publish

CI/CD

This project uses GitHub Actions for continuous integration:

  • CI Pipeline: Runs on every push and pull request

    • Linting with ruff
    • Type checking with mypy
    • Tests on Python 3.12 and 3.13
    • Code coverage reporting
  • Release Pipeline: Automatically publishes to PyPI on version tags

    • Triggered by pushing tags like v1.0.0
    • Creates GitHub releases with artifacts
  • Dependabot: Automatically updates dependencies weekly

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

  • Built on google-genai - Google's unified GenAI SDK (replaces deprecated google-generativeai)
  • Uses langsmith for tracing
  • S3 integration via boto3
  • Type validation with pydantic v2

Support


Made with ❤️ by Aviad Rozenhek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.3.3.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_imagen-0.3.3-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file gemini_imagen-0.3.3.tar.gz.

File metadata

  • Download URL: gemini_imagen-0.3.3.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_imagen-0.3.3.tar.gz
Algorithm Hash digest
SHA256 4bb348e8f552c5830ba569f9469f8922b34b174063e3d2f8e3902e193053931b
MD5 179806fc0b3309201fd714bbf23adb7c
BLAKE2b-256 886445591cac63a6c30dabf1e1ab1bdf0bebdca5f1a54bfde087314a5db008ec

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: gemini_imagen-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_imagen-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3f1695516d47909ce13ff584cc405aeab96cbcb0c219183f68b7afbbccc3749d
MD5 31bb775091863532b8448d5e5bfebd1f
BLAKE2b-256 50a8a23aaa4badf93ccc275c541f7161123a3167da21ebcb76c1a38d732ab036

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page