Skip to main content

Example of Production-ready Google Gemini API wrapper with SRE features

Project description

Building an Enterprise-Ready Gemini Client

License: MIT Python 3.8+

This repository provides a production-ready Python client for the Google Gemini API, serving as a comprehensive example for building robust, enterprise-grade LLM services. It includes key SRE principles like automatic retries, multi-region failover, a circuit breaker pattern, and deep integration with Google Cloud's observability suite (Monitoring and Logging).

✨ Features

  • Automatic Retry with Exponential Backoff - Resilient API calls with configurable retry logic
  • Multi-Region Failover - Automatic region switching on failures for high availability
  • Circuit Breaker Pattern - Intelligent region health tracking to avoid wasting time/quota on failing regions
  • Cloud Monitoring Integration - Custom metrics for latency, retry count, success/failure rates
  • Structured Output - Type-safe responses with Pydantic schema validation
  • Structured Logging - Integration with Google Cloud Logging
  • Async Support - Full async/await support with AsyncGeminiSREClient
  • Production Ready - Comprehensive error handling and observability
  • File Operations - Upload and manage files with automatic deduplication

📁 Project Structure

gemini-sre-client/
├── gemini_sre/              # Main package
│   ├── client.py            # Synchronous client
│   ├── async_client.py      # Asynchronous client
│   ├── core/                # Core functionality
│   │   ├── circuit_breaker.py
│   │   ├── retry.py
│   │   ├── monitoring.py
│   │   ├── logging.py
│   │   ├── deduplication.py
│   │   └── streaming.py
│   └── proxies/             # SDK proxies
│       ├── models.py
│       ├── chats.py
│       ├── files.py
│       └── async_*.py       # Async versions
├── examples/                # 16 working examples
│   ├── basic/               # 4 basic examples
│   ├── advanced/            # 5 advanced examples
│   ├── async/               # 4 async examples
│   └── production/          # 3 production examples
├── tests/                   # Test suite
│   ├── unit/                # Unit tests
│   └── integration/         # Integration tests
├── docs/                    # Documentation
│   ├── api/                 # API reference
│   ├── architecture/        # Technical docs
│   └── development/         # Dev docs
├── README.md                # This file
├── SETUP.md                 # Setup instructions
├── setup.py                 # Package configuration
└── requirements.txt         # Dependencies

🚀 Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd gemini-computer-use

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Configure authentication
gcloud auth application-default login

2. Configuration

Copy the environment template and configure your project:

# Copy template
cp .env.example .env.local

# Edit .env.local and set your project ID
echo "GOOGLE_CLOUD_PROJECT=your-project-id" > .env.local

See SETUP.md for detailed setup instructions.

3. Run Your First Example

# Run basic generation example
.venv/bin/python examples/basic/01_simple_generation.py

# Try streaming
.venv/bin/python examples/basic/02_streaming.py

# Explore all examples
ls examples/

📖 Usage

Basic Content Generation

import os
from dotenv import load_dotenv
from gemini_sre import GeminiSREClient

# Load environment variables
load_dotenv('.env.local', override=True)

# Initialize client
client = GeminiSREClient(
    project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
    locations=["us-central1", "europe-west1"],
    enable_monitoring=False,  # Set to True if you have IAM role
    enable_logging=False,     # Set to True if you have IAM role
)

# Generate content
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain quantum computing in simple terms",
    request_id="example-001",
)

print(response.text)

Streaming Responses

# Stream content for real-time display
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Write a story about a robot learning to paint",
    request_id="stream-001",
):
    print(chunk.text, end="", flush=True)

Chat Operations

# Create chat session with context preservation
chat = client.chats.create(
    model="gemini-2.5-flash",
    request_id="chat-001",
)

# Send messages
response1 = chat.send_message("Hello! My name is Alice.")
response2 = chat.send_message("What's my name?")  # Model remembers!
print(response2.text)  # "Your name is Alice"

Structured Output with Pydantic

from pydantic import BaseModel, Field
from typing import List

# Define your schema
class Recipe(BaseModel):
    name: str = Field(description="Recipe name")
    ingredients: List[str] = Field(description="List of ingredients")
    steps: List[str] = Field(description="Cooking steps")
    cooking_time: int = Field(description="Time in minutes")

# Generate structured output
recipe = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Give me a recipe for chocolate chip cookies",
    config={
        "response_mime_type": "application/json",
        "response_schema": Recipe,
    },
    request_id="recipe-001",
)

# Access typed fields
print(recipe.parsed.name)
print(recipe.parsed.ingredients)

Async Operations

import asyncio
from gemini_sre import AsyncGeminiSREClient

async def main():
    # Create async client
    client = AsyncGeminiSREClient(
        project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
        locations=["us-central1"],
    )

    # Async generation
    response = await client.models.generate_content(
        model="gemini-2.5-flash",
        contents="Explain async programming",
        request_id="async-001",
    )

    print(response.text)

asyncio.run(main())

Concurrent Requests (4.47x Faster!)

import asyncio

async def make_requests():
    client = AsyncGeminiSREClient(
        project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
        locations=["us-central1"],
    )

    # Run 5 requests concurrently
    tasks = [
        client.models.generate_content(
            model="gemini-2.5-flash",
            contents=f"What is {topic}?",
            request_id=f"req-{i}",
        )
        for i, topic in enumerate(["Python", "JavaScript", "Go", "Rust", "TypeScript"])
    ]

    # Wait for all to complete
    results = await asyncio.gather(*tasks)
    return results

# Sequential: ~65s, Concurrent: ~15s (4.47x faster!)
asyncio.run(make_requests())

📚 Examples

We provide 16 comprehensive examples organized by complexity:

Basic Examples (4)

Advanced Examples (5)

Async Examples (4)

Production Examples (3)

See examples/README.md for detailed descriptions.

🔧 Configuration

Environment Variables

The client uses standard Google Cloud environment variables for consistency with the genai SDK:

  • GOOGLE_CLOUD_PROJECT - Your GCP project ID (required)
  • GOOGLE_CLOUD_LOCATION - Default region (optional)
  • GEMINI_ENABLE_MONITORING - Enable metrics (optional)
  • GEMINI_ENABLE_LOGGING - Enable logging (optional)

Client Configuration

from gemini_sre import GeminiSREClient
from gemini_sre.core import RetryConfig

# Full configuration example
client = GeminiSREClient(
    project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
    locations=["us-central1", "europe-west1", "asia-northeast1"],

    # Monitoring & Logging
    enable_monitoring=True,   # Requires roles/monitoring.metricWriter
    enable_logging=True,      # Requires roles/logging.logWriter

    # Retry Configuration
    retry_config=RetryConfig(
        max_attempts=5,
        initial_delay=1.0,
        max_delay=16.0,
        multiplier=2.0,
    ),

    # Circuit Breaker
    enable_circuit_breaker=True,
    circuit_breaker_config={
        "failure_threshold": 5,
        "success_threshold": 2,
        "timeout": 60,
    },
)

📊 Monitoring & Observability

Cloud Monitoring Metrics

The client automatically sends custom metrics to Cloud Monitoring (when enabled):

Metric Type Description
gemini_sre/request/success COUNTER Successful requests
gemini_sre/request/error COUNTER Failed requests
gemini_sre/request/latency DISTRIBUTION Request latency (p50, p95, p99)
gemini_sre/request/retry_count GAUGE Retry attempts per request
gemini_sre/circuit_breaker/state GAUGE Circuit breaker state

All metrics include labels: location, model, operation_type

Cloud Logging

Structured logs include:

  • Request ID for correlation
  • Latency and retry counts
  • Region information
  • Error details
  • Success/failure status

View logs in Cloud Console.

🔐 IAM Permissions

Minimum Required:

  • roles/aiplatform.user - Vertex AI API access

Optional (for full features):

  • roles/monitoring.metricWriter - Cloud Monitoring metrics
  • roles/logging.logWriter - Cloud Logging integration

Set up IAM roles:

# Grant minimum role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="user:your-email@example.com" \
    --role="roles/aiplatform.user"

📖 Documentation

🧪 Testing

# Run unit tests
pytest tests/unit/ -v

# Run integration tests (requires GCP credentials)
pytest tests/integration/ -v

# Run specific test
pytest tests/unit/test_circuit_breaker.py -v

🏗️ Multi-Region Failover

The client automatically switches regions on failures:

  1. Primary Region - Tries configured primary (e.g., us-central1)
  2. Failover - Switches to next region on error
  3. Circuit Breaker - Opens circuit for failing regions (skips them)
  4. Recovery - Tests region health after timeout
  5. Auto-Close - Closes circuit when region recovers

Circuit Breaker States:

  • CLOSED (✅) - Normal operation, region healthy
  • OPEN (🔴) - Region failing, automatically skipped
  • HALF_OPEN (🟡) - Testing if region recovered

🔍 Troubleshooting

Common Issues

Permission Denied:

# Verify authentication
gcloud auth application-default login

# Check project
gcloud config get-value project

# Verify IAM roles
gcloud projects get-iam-policy YOUR_PROJECT_ID

Module Not Found:

# Install in development mode
pip install -e .

Import Errors:

# Correct import
from gemini_sre import GeminiSREClient  # ✅ Correct

# Incorrect import
from gemini_client import GeminiClient  # ❌ Old name

See SETUP.md for more troubleshooting tips.

📦 Dependencies

Core dependencies:

google-genai>=1.42.0              # Gemini API SDK
google-cloud-monitoring>=2.27.0   # Custom metrics
google-cloud-logging>=3.12.0      # Structured logging
pydantic>=2.12.0,<3.0.0          # Schema validation
python-dotenv>=1.0.0             # Environment management

See requirements.txt for full list.

🔗 Useful Links

Google Gemini

Pydantic

Google Cloud

🤝 Contributing

Contributions are welcome! Please see our development documentation:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Ensure all tests pass
  5. Submit a pull request

See docs/development/ for contributor guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

TL;DR: You can freely use, modify, and distribute this code in commercial or non-commercial projects with no restrictions. The author provides no warranty and accepts no liability.

For more information about the license, including what you can and cannot do, see LICENSE_GUIDE.md.


Ready to get started? Check out SETUP.md for detailed setup instructions, or dive into examples/ to see working code!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_sre-0.1.0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_sre-0.1.0-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file gemini_sre-0.1.0.tar.gz.

File metadata

  • Download URL: gemini_sre-0.1.0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for gemini_sre-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ed2d8d3e498e1f2dd0b1e1b3d22ffc8c8d69ec3efc4c8eb26259c4b0b351787
MD5 3a9dc7de08301a6f42e5ceefd9127d31
BLAKE2b-256 a4e52d006bd342087ad5aabc8ea48f95e40b81c9334951762dcd0934a32b7343

See more details on using hashes here.

File details

Details for the file gemini_sre-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gemini_sre-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for gemini_sre-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea3fc487db768342fb38ffb46430fb63ca933378dd4c67fe0776b8bb69cad64e
MD5 cd3ad8394d2ff0381f0a9b0e819a49f5
BLAKE2b-256 80f4944fc3cf81effec97dab9b44b34563cf5ae7ed622f7295f2a927bd76db7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page