Example of Production-ready Google Gemini API wrapper with SRE features
Project description
Building an Enterprise-Ready Gemini Client
This repository provides a production-ready Python client for the Google Gemini API, serving as a comprehensive example for building robust, enterprise-grade LLM services. It includes key SRE principles like automatic retries, multi-region failover, a circuit breaker pattern, and deep integration with Google Cloud's observability suite (Monitoring and Logging).
✨ Features
- ✅ Automatic Retry with Exponential Backoff - Resilient API calls with configurable retry logic
- ✅ Multi-Region Failover - Automatic region switching on failures for high availability
- ✅ Circuit Breaker Pattern - Intelligent region health tracking to avoid wasting time/quota on failing regions
- ✅ Cloud Monitoring Integration - Custom metrics for latency, retry count, success/failure rates
- ✅ Structured Output - Type-safe responses with Pydantic schema validation
- ✅ Structured Logging - Integration with Google Cloud Logging
- ✅ Async Support - Full async/await support with AsyncGeminiSREClient
- ✅ Production Ready - Comprehensive error handling and observability
- ✅ File Operations - Upload and manage files with automatic deduplication
📁 Project Structure
gemini-sre-client/
├── gemini_sre/ # Main package
│ ├── client.py # Synchronous client
│ ├── async_client.py # Asynchronous client
│ ├── core/ # Core functionality
│ │ ├── circuit_breaker.py
│ │ ├── retry.py
│ │ ├── monitoring.py
│ │ ├── logging.py
│ │ ├── deduplication.py
│ │ └── streaming.py
│ └── proxies/ # SDK proxies
│ ├── models.py
│ ├── chats.py
│ ├── files.py
│ └── async_*.py # Async versions
├── examples/ # 16 working examples
│ ├── basic/ # 4 basic examples
│ ├── advanced/ # 5 advanced examples
│ ├── async/ # 4 async examples
│ └── production/ # 3 production examples
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── docs/ # Documentation
│ ├── api/ # API reference
│ ├── architecture/ # Technical docs
│ └── development/ # Dev docs
├── README.md # This file
├── SETUP.md # Setup instructions
├── setup.py # Package configuration
└── requirements.txt # Dependencies
🚀 Quick Start
1. Installation
# Clone the repository
git clone <repository-url>
cd gemini-computer-use
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure authentication
gcloud auth application-default login
2. Configuration
Copy the environment template and configure your project:
# Copy template
cp .env.example .env.local
# Edit .env.local and set your project ID
echo "GOOGLE_CLOUD_PROJECT=your-project-id" > .env.local
See SETUP.md for detailed setup instructions.
3. Run Your First Example
# Run basic generation example
.venv/bin/python examples/basic/01_simple_generation.py
# Try streaming
.venv/bin/python examples/basic/02_streaming.py
# Explore all examples
ls examples/
📖 Usage
Basic Content Generation
import os
from dotenv import load_dotenv
from gemini_sre import GeminiSREClient
# Load environment variables
load_dotenv('.env.local', override=True)
# Initialize client
client = GeminiSREClient(
project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
locations=["us-central1", "europe-west1"],
enable_monitoring=False, # Set to True if you have IAM role
enable_logging=False, # Set to True if you have IAM role
)
# Generate content
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain quantum computing in simple terms",
request_id="example-001",
)
print(response.text)
Streaming Responses
# Stream content for real-time display
for chunk in client.models.generate_content_stream(
model="gemini-2.5-flash",
contents="Write a story about a robot learning to paint",
request_id="stream-001",
):
print(chunk.text, end="", flush=True)
Chat Operations
# Create chat session with context preservation
chat = client.chats.create(
model="gemini-2.5-flash",
request_id="chat-001",
)
# Send messages
response1 = chat.send_message("Hello! My name is Alice.")
response2 = chat.send_message("What's my name?") # Model remembers!
print(response2.text) # "Your name is Alice"
Structured Output with Pydantic
from pydantic import BaseModel, Field
from typing import List
# Define your schema
class Recipe(BaseModel):
name: str = Field(description="Recipe name")
ingredients: List[str] = Field(description="List of ingredients")
steps: List[str] = Field(description="Cooking steps")
cooking_time: int = Field(description="Time in minutes")
# Generate structured output
recipe = client.models.generate_content(
model="gemini-2.5-flash",
contents="Give me a recipe for chocolate chip cookies",
config={
"response_mime_type": "application/json",
"response_schema": Recipe,
},
request_id="recipe-001",
)
# Access typed fields
print(recipe.parsed.name)
print(recipe.parsed.ingredients)
Async Operations
import asyncio
from gemini_sre import AsyncGeminiSREClient
async def main():
# Create async client
client = AsyncGeminiSREClient(
project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
locations=["us-central1"],
)
# Async generation
response = await client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain async programming",
request_id="async-001",
)
print(response.text)
asyncio.run(main())
Concurrent Requests (4.47x Faster!)
import asyncio
async def make_requests():
client = AsyncGeminiSREClient(
project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
locations=["us-central1"],
)
# Run 5 requests concurrently
tasks = [
client.models.generate_content(
model="gemini-2.5-flash",
contents=f"What is {topic}?",
request_id=f"req-{i}",
)
for i, topic in enumerate(["Python", "JavaScript", "Go", "Rust", "TypeScript"])
]
# Wait for all to complete
results = await asyncio.gather(*tasks)
return results
# Sequential: ~65s, Concurrent: ~15s (4.47x faster!)
asyncio.run(make_requests())
📚 Examples
We provide 16 comprehensive examples organized by complexity:
Basic Examples (4)
- 01 - Simple Generation - Basic content generation
- 02 - Streaming - Real-time streaming responses
- 03 - Chat Operations - Stateful conversations
- 04 - Structured Output - Pydantic schema validation
Advanced Examples (5)
- 01 - Multi-Region - Multi-region failover
- 02 - Circuit Breaker - Circuit breaker pattern
- 03 - Custom Retry - Custom retry strategies
- 04 - Monitoring - Cloud Monitoring integration
- 05 - File Operations - File upload/management
Async Examples (4)
- 01 - Async Basic - Basic async operations
- 02 - Async Streaming - Async streaming
- 03 - Concurrent Requests - Parallel requests (4.47x speedup!)
- 04 - Async Chat - Async chat sessions
Production Examples (3)
- 01 - Error Handling - Comprehensive error patterns
- 02 - Logging Setup - Logging configuration
- 03 - Best Practices - Production deployment guide
See examples/README.md for detailed descriptions.
🔧 Configuration
Environment Variables
The client uses standard Google Cloud environment variables for consistency with the genai SDK:
GOOGLE_CLOUD_PROJECT- Your GCP project ID (required)GOOGLE_CLOUD_LOCATION- Default region (optional)GEMINI_ENABLE_MONITORING- Enable metrics (optional)GEMINI_ENABLE_LOGGING- Enable logging (optional)
Client Configuration
from gemini_sre import GeminiSREClient
from gemini_sre.core import RetryConfig
# Full configuration example
client = GeminiSREClient(
project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
locations=["us-central1", "europe-west1", "asia-northeast1"],
# Monitoring & Logging
enable_monitoring=True, # Requires roles/monitoring.metricWriter
enable_logging=True, # Requires roles/logging.logWriter
# Retry Configuration
retry_config=RetryConfig(
max_attempts=5,
initial_delay=1.0,
max_delay=16.0,
multiplier=2.0,
),
# Circuit Breaker
enable_circuit_breaker=True,
circuit_breaker_config={
"failure_threshold": 5,
"success_threshold": 2,
"timeout": 60,
},
)
📊 Monitoring & Observability
Cloud Monitoring Metrics
The client automatically sends custom metrics to Cloud Monitoring (when enabled):
| Metric | Type | Description |
|---|---|---|
gemini_sre/request/success |
COUNTER | Successful requests |
gemini_sre/request/error |
COUNTER | Failed requests |
gemini_sre/request/latency |
DISTRIBUTION | Request latency (p50, p95, p99) |
gemini_sre/request/retry_count |
GAUGE | Retry attempts per request |
gemini_sre/circuit_breaker/state |
GAUGE | Circuit breaker state |
All metrics include labels: location, model, operation_type
Cloud Logging
Structured logs include:
- Request ID for correlation
- Latency and retry counts
- Region information
- Error details
- Success/failure status
View logs in Cloud Console.
🔐 IAM Permissions
Minimum Required:
roles/aiplatform.user- Vertex AI API access
Optional (for full features):
roles/monitoring.metricWriter- Cloud Monitoring metricsroles/logging.logWriter- Cloud Logging integration
Set up IAM roles:
# Grant minimum role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="user:your-email@example.com" \
--role="roles/aiplatform.user"
📖 Documentation
- Setup Guide - Detailed installation and configuration
- API Reference - Complete API documentation
- Architecture Docs - Technical design and decisions
- Development Guide - For contributors
- Examples - 16 working code examples
🧪 Testing
# Run unit tests
pytest tests/unit/ -v
# Run integration tests (requires GCP credentials)
pytest tests/integration/ -v
# Run specific test
pytest tests/unit/test_circuit_breaker.py -v
🏗️ Multi-Region Failover
The client automatically switches regions on failures:
- Primary Region - Tries configured primary (e.g.,
us-central1) - Failover - Switches to next region on error
- Circuit Breaker - Opens circuit for failing regions (skips them)
- Recovery - Tests region health after timeout
- Auto-Close - Closes circuit when region recovers
Circuit Breaker States:
- CLOSED (✅) - Normal operation, region healthy
- OPEN (🔴) - Region failing, automatically skipped
- HALF_OPEN (🟡) - Testing if region recovered
🔍 Troubleshooting
Common Issues
Permission Denied:
# Verify authentication
gcloud auth application-default login
# Check project
gcloud config get-value project
# Verify IAM roles
gcloud projects get-iam-policy YOUR_PROJECT_ID
Module Not Found:
# Install in development mode
pip install -e .
Import Errors:
# Correct import
from gemini_sre import GeminiSREClient # ✅ Correct
# Incorrect import
from gemini_client import GeminiClient # ❌ Old name
See SETUP.md for more troubleshooting tips.
📦 Dependencies
Core dependencies:
google-genai>=1.42.0 # Gemini API SDK
google-cloud-monitoring>=2.27.0 # Custom metrics
google-cloud-logging>=3.12.0 # Structured logging
pydantic>=2.12.0,<3.0.0 # Schema validation
python-dotenv>=1.0.0 # Environment management
See requirements.txt for full list.
🔗 Useful Links
Google Gemini
Pydantic
Google Cloud
🤝 Contributing
Contributions are welcome! Please see our development documentation:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass
- Submit a pull request
See docs/development/ for contributor guidelines.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
TL;DR: You can freely use, modify, and distribute this code in commercial or non-commercial projects with no restrictions. The author provides no warranty and accepts no liability.
For more information about the license, including what you can and cannot do, see LICENSE_GUIDE.md.
Ready to get started? Check out SETUP.md for detailed setup instructions, or dive into examples/ to see working code!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_sre-0.1.0.tar.gz.
File metadata
- Download URL: gemini_sre-0.1.0.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ed2d8d3e498e1f2dd0b1e1b3d22ffc8c8d69ec3efc4c8eb26259c4b0b351787
|
|
| MD5 |
3a9dc7de08301a6f42e5ceefd9127d31
|
|
| BLAKE2b-256 |
a4e52d006bd342087ad5aabc8ea48f95e40b81c9334951762dcd0934a32b7343
|
File details
Details for the file gemini_sre-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gemini_sre-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea3fc487db768342fb38ffb46430fb63ca933378dd4c67fe0776b8bb69cad64e
|
|
| MD5 |
cd3ad8394d2ff0381f0a9b0e819a49f5
|
|
| BLAKE2b-256 |
80f4944fc3cf81effec97dab9b44b34563cf5ae7ed622f7295f2a927bd76db7a
|