A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities with S3 and LangSmith integration
Project description
gemini-imagen
A comprehensive Python wrapper for Google Gemini's image generation and analysis capabilities, featuring:
- 🎨 Text-to-Image Generation - Create images from text prompts
- 🏷️ Labeled Input Images - Reference images by name in prompts for better control
- 📸 Multiple Output Images - Generate multiple variations in one request
- 💬 Image Analysis - Get detailed text descriptions of images
- ☁️ S3 Integration - Seamless AWS S3 upload/download with URL logging
- 📈 LangSmith Tracing - Full observability for debugging and monitoring
- 🔄 Type-Safe - Full type hints with Pydantic validation
Installation
Basic Installation
Using pip:
pip install gemini-imagen
Using uv (recommended - faster):
uv pip install gemini-imagen
With S3 Support
Using pip:
pip install gemini-imagen[s3]
Using uv:
uv pip install gemini-imagen[s3]
From Source
Using uv (recommended):
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
uv sync --all-extras
Or using pip:
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
pip install -e ".[dev,s3]"
Quick Start
1. Set Up API Key
export GOOGLE_API_KEY="your-api-key-here"
Or create a .env file:
GOOGLE_API_KEY=your-api-key-here
2. Generate Your First Image
from gemini_imagen import GeminiImageGenerator
generator = GeminiImageGenerator()
result = generator.generate(
prompt="A serene Japanese garden with cherry blossoms",
output_images=["garden.png"]
)
print(f"Image saved to: {result.image_location}")
Features
Text-to-Image Generation
Generate images from text descriptions:
result = generator.generate(
prompt="A futuristic cityscape at sunset with flying cars",
output_images=["cityscape.png"]
)
Image Analysis
Analyze existing images and get text descriptions:
result = generator.generate(
prompt="Describe this image in detail, including colors, objects, and mood",
input_images=["photo.jpg"],
output_text=True
)
print(result.text)
Labeled Input Images
Reference multiple images by name in your prompts:
result = generator.generate(
prompt="Blend the artistic style from Photo A with the composition from Photo B",
input_images=[
("Photo A (style):", "style_reference.jpg"),
("Photo B (composition):", "composition_reference.jpg")
],
output_images=["blended_result.png"]
)
Multiple Output Images
Request multiple variations:
result = generator.generate(
prompt="Create 3 variations of a mountain landscape",
output_images=[
("Sunrise version", "mountain_sunrise.png"),
("Sunset version", "mountain_sunset.png"),
("Night version", "mountain_night.png")
]
)
# Note: Gemini may return fewer images than requested
for label, uri in zip(result.image_labels, result.image_locations):
print(f"{label}: {uri}")
S3 Integration
Upload/download images directly to/from AWS S3:
# Configure AWS credentials in .env:
# GV_AWS_ACCESS_KEY_ID=your_key
# GV_AWS_SECRET_ACCESS_KEY=your_secret
# GV_AWS_STORAGE_BUCKET_NAME=your_bucket
result = generator.generate(
prompt="A magical forest scene",
input_images=["s3://my-bucket/reference.jpg"],
output_images=["s3://my-bucket/output.png"]
)
# Access S3 URLs
print(result.image_s3_uri) # s3://my-bucket/output.png
print(result.image_http_url) # https://my-bucket.s3.region.amazonaws.com/...
LangSmith Tracing
Enable observability with LangSmith:
import os
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-key"
generator = GeminiImageGenerator(log_images=True)
result = generator.generate(
prompt="A robot reading in a cozy library",
output_images=["robot_library.png"],
metadata={"user_id": "demo", "session": "example"},
tags=["demo", "robot"]
)
# View traces at https://smith.langchain.com/
Image + Text Output
Get both an image and explanation:
result = generator.generate(
prompt="Generate a futuristic city and explain its key architectural features",
output_images=["city.png"],
output_text=True
)
print(f"Image: {result.image_location}")
print(f"Explanation: {result.text}")
Architecture
The package uses gemini-2.5-flash-image for all operations:
graph TB
A[User Request] --> B[Load Input Images<br/>with Labels]
B --> C[Build Content<br/>Prompt + Images]
C --> D[gemini-2.5-flash-image<br/>Generate Content]
D --> E{Extract Response}
E -->|Has Images| F[PIL Images]
E -->|Has Text| G[Plain Text]
F --> H{Save to S3/Local?}
G --> I[Return Result]
H -->|Yes| J[Upload & Get URLs]
H -->|No| I
J --> I
I --> K{LangSmith<br/>Enabled?}
K -->|Yes| L[Log to LangSmith<br/>- Images as S3 URLs<br/>- Text response]
K -->|No| M[GenerationResult]
L --> M
API Reference
GeminiImageGenerator
generator = GeminiImageGenerator(
model_name="gemini-2.5-flash-image", # Image generation model
api_key=None, # Auto-loads from env
log_images=True # Enable LangSmith logging
)
generate() Method
result = generator.generate(
prompt: str, # Main prompt (required)
system_prompt: Optional[str] = None, # System instructions
input_images: Optional[List[ImageSource]] = None, # Input images
temperature: Optional[float] = None, # Sampling temperature
# Output configuration
output_images: Optional[List[OutputImageSpec]] = None, # Generate images
output_text: bool = False, # Generate text
# LangSmith
metadata: Optional[Dict[str, str]] = None,
tags: Optional[List[str]] = None
) -> GenerationResult
Type Definitions:
-
ImageSource = RawImageSource | LabeledImageRawImageSource = Image.Image | str | PathLabeledImage = Tuple[str, RawImageSource]
-
OutputImageSpec = OutputLocation | LabeledOutputOutputLocation = str | PathLabeledOutput = Tuple[str, OutputLocation]
GenerationResult
class GenerationResult:
text: Optional[str] # Generated text
images: List[Image.Image] # PIL Image objects
image_labels: List[Optional[str]] # Image labels
image_locations: List[str] # Local file paths
image_s3_uris: List[Optional[str]] # S3 URIs
image_http_urls: List[Optional[str]] # HTTP URLs
# Convenience properties (first image)
@property
def image(self) -> Optional[Image.Image]
@property
def image_location(self) -> Optional[str]
@property
def image_s3_uri(self) -> Optional[str]
@property
def image_http_url(self) -> Optional[str]
Structured Output
⚠️ The image model (gemini-2.5-flash-image) does not support JSON schemas or structured output.
For structured output, use a two-step approach:
# Step 1: Generate or analyze image
from gemini_imagen import GeminiImageGenerator
generator = GeminiImageGenerator()
result = generator.generate(
prompt="Analyze this image in detail",
input_images=["image.png"],
output_text=True
)
# Step 2: Get structured output with gemini-2.5-flash
from google import genai
from google.genai import types
from pydantic import BaseModel
class ImageAnalysis(BaseModel):
objects: list[str]
colors: list[str]
mood: str
client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=f"{result.text}\\n\\nFormat as JSON with fields: objects, colors, mood",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=ImageAnalysis.model_json_schema()
)
)
analysis = ImageAnalysis.model_validate_json(response.text)
Configuration
Environment Variables
# Required
GOOGLE_API_KEY=your_google_api_key
# Optional - for S3 features
GV_AWS_ACCESS_KEY_ID=your_aws_access_key
GV_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
GV_AWS_STORAGE_BUCKET_NAME=your-bucket-name
# Optional - for LangSmith tracing
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_TRACING=true
Examples
See the examples/ directory for complete working examples:
basic_generation.py- Simple text-to-imageimage_analysis.py- Analyze imageslabeled_inputs.py- Use labeled imagess3_integration.py- S3 upload/downloadlangsmith_tracing.py- Enable tracing
Pricing
Image Generation (gemini-2.5-flash-image)
- Cost: $30/1M output tokens
- Per Image: ~$0.039 (1290 tokens at 1024x1024)
Text Model (gemini-2.5-flash)
- Input: $0.30/1M tokens
- Output: $1.20/1M tokens
Limitations
- Multiple images: Gemini may not always generate the exact number requested
- Structured output: Only available with text model (separate call required)
- Rate limits (free tier): 10 requests/minute, 1500/day
Development
Setup Development Environment
Using uv (recommended):
# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
# Lock dependencies and sync (installs everything)
uv lock
uv sync --all-extras
# Install pre-commit hooks
uv run pre-commit install
Or using pip:
# Clone the repository
git clone https://github.com/aviadr1/gemini-imagen.git
cd gemini-imagen
# Install with development dependencies
pip install -e ".[dev,s3]"
# Install pre-commit hooks
pre-commit install
Running Tests
Using uv:
# Run unit tests only (no API keys required)
uv run pytest tests/ -v -m "not integration"
# Run all tests including integration (requires API keys)
uv run pytest tests/ -v
# Run with coverage
uv run pytest tests/ -v -m "not integration" --cov=gemini_imagen --cov-report=html
# Run specific test file
uv run pytest tests/test_gemini_image_wrapper.py -v
Using make (with uv):
make test # Runs: uv run pytest (unit tests only)
Test Categories:
- Unit tests: Mocked tests, no API keys required
- Integration tests: Require real API keys (
-m integration)GOOGLE_API_KEY- for Gemini API testsGV_AWS_*- for S3 integration testsLANGSMITH_API_KEY- for LangSmith tracing tests
Integration tests are automatically skipped if credentials are missing.
Code Quality
# Run linter
make lint
# Format code
make format
# Run pre-commit hooks
make pre-commit
Building and Publishing
Quick Release Process
One command to release:
# Patch release (0.1.0 -> 0.1.1) - default
./scripts/release.sh
# Minor release (0.1.0 -> 0.2.0)
./scripts/release.sh minor
# Major release (0.1.0 -> 1.0.0)
./scripts/release.sh major
# Test on TestPyPI first
./scripts/release.sh patch --test
The release script automatically:
- Bumps the version (patch/minor/major)
- Commits the version change
- Creates and pushes a git tag
- Installs dependencies
- Runs linters (ruff + mypy)
- Runs tests
- Builds the package
- Verifies with twine
- Uploads to PyPI (with confirmation)
Manual version bump (if needed):
# Bump version manually
uv run python scripts/bump_version.py patch # 0.1.0 -> 0.1.1
uv run python scripts/bump_version.py minor # 0.1.0 -> 0.2.0
uv run python scripts/bump_version.py major # 0.1.0 -> 1.0.0
Manual Build/Publish
# Build package
make build
# Publish to PyPI (requires credentials)
make publish
CI/CD
This project uses GitHub Actions for continuous integration:
-
CI Pipeline: Runs on every push and pull request
- Linting with ruff
- Type checking with mypy
- Tests on Python 3.12 and 3.13
- Code coverage reporting
-
Release Pipeline: Automatically publishes to PyPI on version tags
- Triggered by pushing tags like
v1.0.0 - Creates GitHub releases with artifacts
- Triggered by pushing tags like
-
Dependabot: Automatically updates dependencies weekly
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Acknowledgments
- Built on
google-genai- Google's unified GenAI SDK (replaces deprecatedgoogle-generativeai) - Uses
langsmithfor tracing - S3 integration via
boto3 - Type validation with
pydanticv2
Support
- Issues: GitHub Issues
- Documentation: README
- Examples: examples/
Made with ❤️ by Aviad Rozenhek
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_imagen-0.2.5.tar.gz.
File metadata
- Download URL: gemini_imagen-0.2.5.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78d3263880a39b0c7d9989b780b0dfe61a7613ba23f770881f73229992e881a4
|
|
| MD5 |
78aab2ed80551b7b7e8066bf2ba5b186
|
|
| BLAKE2b-256 |
06472e7e4879ebde48a944b21645e96eb209f76e7df688fae956d3ff362a264d
|
File details
Details for the file gemini_imagen-0.2.5-py3-none-any.whl.
File metadata
- Download URL: gemini_imagen-0.2.5-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49bcb047aaf9a71ebb2934e78ad72f1475572256719ad8f57a0b44ed855b0536
|
|
| MD5 |
2ff388a8a72ef46d32e0ee4b2f2963f7
|
|
| BLAKE2b-256 |
f2f26db446e0d84907af01f1109e46786d818e033cfa560a35e3efa3d602c123
|