A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration

These details have not been verified by PyPI

Project links

Project description

gemini-imagen

A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities, featuring:

🎨 Text-to-Image Generation - Create images from text prompts
🏷️ Labeled Input Images - Reference images by name in prompts for better control
📸 Multiple Output Images - Generate multiple variations in one request
💬 Image Analysis - Get detailed text descriptions of images
☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
📈 LangSmith Tracing - Full observability for debugging and monitoring
🔄 Type-Safe - Full type hints with Pydantic validation

Installation

Basic Installation

Using pip:

pip install gemini-imagen

Using uv (recommended - faster):

uv pip install gemini-imagen

With S3 Support

Using pip:

pip install gemini-imagen[s3]

Using uv:

uv pip install gemini-imagen[s3]

From Source

Using uv (recommended):

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
uv sync --all-extras

Or using pip:

git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"

Quick Start

1. Set Up API Key

export GOOGLE_API_KEY="your-api-key-here"

Or create a .env file:

GOOGLE_API_KEY=your-api-key-here

2. Generate Your First Image

from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()

result = generator.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    output_images=["garden.png"]
)

print(f"Image saved to: {result.image_location}")

Features

Text-to-Image Generation

Generate images from text descriptions:

result = generator.generate(
    prompt="A futuristic cityscape at sunset with flying cars",
    output_images=["cityscape.png"]
)

Image Analysis

Analyze existing images and get text descriptions:

result = generator.generate(
    prompt="Describe this image in detail, including colors, objects, and mood",
    input_images=["photo.jpg"],
    output_text=True
)

print(result.text)

Labeled Input Images

Reference multiple images by name in your prompts:

result = generator.generate(
    prompt="Blend the artistic style from Photo A with the composition from Photo B",
    input_images=[
        ("Photo A (style):", "style_reference.jpg"),
        ("Photo B (composition):", "composition_reference.jpg")
    ],
    output_images=["blended_result.png"]
)

Multiple Output Images

Request multiple variations:

result = generator.generate(
    prompt="Create 3 variations of a mountain landscape",
    output_images=[
        ("Sunrise version", "mountain_sunrise.png"),
        ("Sunset version", "mountain_sunset.png"),
        ("Night version", "mountain_night.png")
    ]
)

# Note: Gemini may return fewer images than requested
for label, uri in zip(result.image_labels, result.image_locations):
    print(f"{label}: {uri}")

S3 Integration

Upload/download images directly to/from AWS S3:

# Configure AWS credentials in .env:
# GV_AWS_ACCESS_KEY_ID=your_key
# GV_AWS_SECRET_ACCESS_KEY=your_secret
# GV_AWS_STORAGE_BUCKET_NAME=your_bucket

result = generator.generate(
    prompt="A magical forest scene",
    input_images=["s3://my-bucket/reference.jpg"],
    output_images=["s3://my-bucket/output.png"]
)

# Access S3 URLs
print(result.image_s3_uri)    # s3://my-bucket/output.png
print(result.image_http_url)  # https://my-bucket.s3.region.amazonaws.com/...

LangSmith Tracing

Enable observability with LangSmith:

import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"

generator = GeminiImageGenerator(log_images=True)

result = generator.generate(
    prompt="A robot reading in a cozy library",
    output_images=["robot_library.png"],
    metadata={"user_id": "demo", "session": "example"},
    tags=["demo", "robot"]
)

# View traces at https://smith.langchain.com/

Image + Text Output

Get both an image and explanation:

result = generator.generate(
    prompt="Generate a futuristic city and explain its key architectural features",
    output_images=["city.png"],
    output_text=True
)

print(f"Image: {result.image_location}")
print(f"Explanation: {result.text}")

Architecture

The package uses gemini-2.5-flash-image for all operations:

graph TB
    A[User Request] --> B[Load Input Images<br/>with Labels]
    B --> C[Build Content<br/>Prompt + Images]
    C --> D[gemini-2.5-flash-image<br/>Generate Content]
    D --> E{Extract Response}
    E -->|Has Images| F[PIL Images]
    E -->|Has Text| G[Plain Text]
    F --> H{Save to S3/Local?}
    G --> I[Return Result]
    H -->|Yes| J[Upload & Get URLs]
    H -->|No| I
    J --> I
    I --> K{LangSmith<br/>Enabled?}
    K -->|Yes| L[Log to LangSmith<br/>- Images as S3 URLs<br/>- Text response]
    K -->|No| M[GenerationResult]
    L --> M

API Reference

GeminiImageGenerator

generator = GeminiImageGenerator(
    model_name="gemini-2.5-flash-image",  # Image generation model
    api_key=None,                          # Auto-loads from env
    log_images=True                        # Enable LangSmith logging
)

generate() Method

result = generator.generate(
    prompt: str,                                      # Main prompt (required)
    system_prompt: Optional[str] = None,              # System instructions
    input_images: Optional[List[ImageSource]] = None, # Input images
    temperature: Optional[float] = None,              # Sampling temperature

    # Output configuration
    output_images: Optional[List[OutputImageSpec]] = None,  # Generate images
    output_text: bool = False,                              # Generate text

    # LangSmith
    metadata: Optional[Dict[str, str]] = None,
    tags: Optional[List[str]] = None
) -> GenerationResult

Type Definitions:

ImageSource = RawImageSource | LabeledImage
- RawImageSource = Image.Image | str | Path
- LabeledImage = Tuple[str, RawImageSource]
OutputImageSpec = OutputLocation | LabeledOutput
- OutputLocation = str | Path
- LabeledOutput = Tuple[str, OutputLocation]

GenerationResult

class GenerationResult:
    text: Optional[str]                      # Generated text
    images: List[Image.Image]                # PIL Image objects
    image_labels: List[Optional[str]]        # Image labels
    image_locations: List[str]               # Local file paths
    image_s3_uris: List[Optional[str]]       # S3 URIs
    image_http_urls: List[Optional[str]]     # HTTP URLs

    # Convenience properties (first image)
    @property
    def image(self) -> Optional[Image.Image]
    @property
    def image_location(self) -> Optional[str]
    @property
    def image_s3_uri(self) -> Optional[str]
    @property
    def image_http_url(self) -> Optional[str]

Structured Output

⚠️ The image model (gemini-2.5-flash-image) does not support JSON schemas or structured output.

For structured output, use a two-step approach:

# Step 1: Generate or analyze image
from gemini_imagen import GeminiImageGenerator

generator = GeminiImageGenerator()
result = generator.generate(
    prompt="Analyze this image in detail",
    input_images=["image.png"],
    output_text=True
)

# Step 2: Get structured output with gemini-2.5-flash
from google import genai
from google.genai import types
from pydantic import BaseModel

class ImageAnalysis(BaseModel):
    objects: list[str]
    colors: list[str]
    mood: str

client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=f"{result.text}\\n\\nFormat as JSON with fields: objects, colors, mood",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=ImageAnalysis.model_json_schema()
    )
)

analysis = ImageAnalysis.model_validate_json(response.text)

Configuration

Environment Variables

# Required
GOOGLE_API_KEY=your_google_api_key

# Optional - for S3 features
GV_AWS_ACCESS_KEY_ID=your_aws_access_key
GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name

# Optional - for LangSmith tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_TRACING=true

Examples

See the examples/ directory for complete working examples:

basic_generation.py - Simple text-to-image
image_analysis.py - Analyze images
labeled_inputs.py - Use labeled images
s3_integration.py - S3 upload/download
langsmith_tracing.py - Enable tracing

Pricing

Image Generation (gemini-2.5-flash-image)

Cost: $30/1M output tokens
Per Image: ~$0.039 (1290 tokens at 1024x1024)

Text Model (gemini-2.5-flash)

Input: $0.30/1M tokens
Output: $1.20/1M tokens

Limitations

Multiple images: Gemini may not always generate the exact number requested
Structured output: Only available with text model (separate call required)
Rate limits (free tier): 10 requests/minute, 1500/day

Development

Setup Development Environment

Using uv (recommended):

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Lock dependencies and sync (installs everything)
uv lock
uv sync --all-extras

# Install pre-commit hooks
uv run pre-commit install

Or using pip:

# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen

# Install with development dependencies
pip install -e ".[dev,s3]"

# Install pre-commit hooks
pre-commit install

Running Tests

Using uv:

# Run unit tests only (no API keys required)
uv run pytest tests/ -v -m "not integration"

# Run all tests including integration (requires API keys)
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ -v -m "not integration" --cov=gemini_imagen --cov-report=html

# Run specific test file
uv run pytest tests/test_gemini_image_wrapper.py -v

Using make (with uv):

make test    # Runs: uv run pytest (unit tests only)

Test Categories:

Unit tests: Mocked tests, no API keys required
Integration tests: Require real API keys (-m integration)
- GOOGLE_API_KEY - for Gemini API tests
- GV_AWS_* - for S3 integration tests
- LANGSMITH_API_KEY - for LangSmith tracing tests

Integration tests are automatically skipped if credentials are missing.

Code Quality

# Run linter
make lint

# Format code
make format

# Run pre-commit hooks
make pre-commit

Building and Publishing

Quick Release Process

One command to release:

# Patch release (0.1.0 -> 0.1.1) - default
./scripts/release.sh

# Minor release (0.1.0 -> 0.2.0)
./scripts/release.sh minor

# Major release (0.1.0 -> 1.0.0)
./scripts/release.sh major

# Test on TestPyPI first
./scripts/release.sh patch --test

The release script automatically:

Bumps the version (patch/minor/major)
Commits the version change
Creates and pushes a git tag
Installs dependencies
Runs linters (ruff + mypy)
Runs tests
Builds the package
Verifies with twine
Uploads to PyPI (with confirmation)

Manual version bump (if needed):

# Bump version manually
uv run python scripts/bump_version.py patch  # 0.1.0 -> 0.1.1
uv run python scripts/bump_version.py minor  # 0.1.0 -> 0.2.0
uv run python scripts/bump_version.py major  # 0.1.0 -> 1.0.0

Manual Build/Publish

# Build package
make build

# Publish to PyPI (requires credentials)
make publish

CI/CD

This project uses GitHub Actions for continuous integration:

CI Pipeline: Runs on every push and pull request
- Linting with ruff
- Type checking with mypy
- Tests on Python 3.12 and 3.13
- Code coverage reporting
Release Pipeline: Automatically publishes to PyPI on version tags
- Triggered by pushing tags like v1.0.0
- Creates GitHub releases with artifacts
Dependabot: Automatically updates dependencies weekly

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Built on google-genai - Google's unified GenAI SDK (replaces deprecated google-generativeai)
Uses langsmith for tracing
S3 integration via boto3
Type validation with pydantic v2

Support

Issues: GitHub Issues
Documentation: README
Examples: examples/

Made with ❤️ by Aviad Rozenhek

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.6

Nov 3, 2025

0.6.5

Nov 3, 2025

0.6.4

Nov 3, 2025

0.6.3

Nov 3, 2025

0.6.2

Nov 3, 2025

0.6.1

Nov 3, 2025

0.6.0

Nov 3, 2025

0.5.0

Nov 2, 2025

0.4.0

Oct 31, 2025

0.3.4

Oct 31, 2025

0.3.3

Oct 31, 2025

0.3.1

Oct 31, 2025

0.3.0

Oct 31, 2025

This version

0.2.5

Oct 31, 2025

0.2.4

Oct 30, 2025

0.2.3

Oct 30, 2025

0.1.1

Oct 30, 2025

0.1.0

Oct 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_imagen-0.2.5.tar.gz (2.2 MB view details)

Uploaded Oct 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemini_imagen-0.2.5-py3-none-any.whl (16.8 kB view details)

Uploaded Oct 31, 2025 Python 3

File details

Details for the file gemini_imagen-0.2.5.tar.gz.

File metadata

Download URL: gemini_imagen-0.2.5.tar.gz
Upload date: Oct 31, 2025
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_imagen-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`78d3263880a39b0c7d9989b780b0dfe61a7613ba23f770881f73229992e881a4`
MD5	`78aab2ed80551b7b7e8066bf2ba5b186`
BLAKE2b-256	`06472e7e4879ebde48a944b21645e96eb209f76e7df688fae956d3ff362a264d`

See more details on using hashes here.

File details

Details for the file gemini_imagen-0.2.5-py3-none-any.whl.

File metadata

Download URL: gemini_imagen-0.2.5-py3-none-any.whl
Upload date: Oct 31, 2025
Size: 16.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_imagen-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49bcb047aaf9a71ebb2934e78ad72f1475572256719ad8f57a0b44ed855b0536`
MD5	`2ff388a8a72ef46d32e0ee4b2f2963f7`
BLAKE2b-256	`f2f26db446e0d84907af01f1109e46786d818e033cfa560a35e3efa3d602c123`

See more details on using hashes here.

gemini-imagen 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gemini-imagen

Installation

Basic Installation

With S3 Support

From Source

Quick Start

1. Set Up API Key

2. Generate Your First Image

Features

Text-to-Image Generation

Image Analysis

Labeled Input Images

Multiple Output Images

S3 Integration

LangSmith Tracing

Image + Text Output

Architecture

API Reference

GeminiImageGenerator

generate() Method

GenerationResult

Structured Output

Configuration

Environment Variables

Examples

Pricing

Image Generation (gemini-2.5-flash-image)

Text Model (gemini-2.5-flash)

Limitations

Development

Setup Development Environment

Running Tests

Code Quality

Building and Publishing

Quick Release Process

Manual Build/Publish

CI/CD

Contributing

License

Acknowledgments

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes