Universal LLM client for production - works with Claude, GPT-4, Gemini, Llama, and any LLM

These details have not been verified by PyPI

Project links

Project description

Socrates Nexus

Universal LLM client for production - Works with Claude, GPT-4, Gemini, Llama, and any LLM.

Extracted from 18 months of production use in Socrates AI platform.

Why Socrates Nexus?

Most LLM clients handle the happy path. Socrates Nexus handles production:

✅ Automatic retry logic with exponential backoff (timeouts, rate limits, temporary errors)
✅ Token usage tracking - Know exactly what you're spending across providers
✅ Streaming support with helpers (not fighting with raw streams)
✅ Async + sync APIs - Choose what works for you
✅ Multi-model fallback - If Claude is down, try GPT-4
✅ Type hints throughout - Better IDE experience
✅ Universal API - Same code works with Claude, GPT-4, Gemini, Llama

Quick Start

Installation

# Install with Claude support
pip install socrates-nexus[anthropic]

# Or with all providers
pip install socrates-nexus[all]

Basic Usage

from socrates_nexus import LLMClient

# Create client for any LLM
client = LLMClient(
    provider="anthropic",
    model="claude-opus",
    api_key="your-api-key"
)

# Chat - automatic retries, token tracking included
response = client.chat("What is machine learning?")
print(response.content)
print(f"Cost: ${response.usage.cost_usd}")

Multiple Providers (Same API)

from socrates_nexus import LLMClient

# Claude
claude = LLMClient(provider="anthropic", model="claude-opus", api_key="sk-ant-...")

# GPT-4
gpt4 = LLMClient(provider="openai", model="gpt-4", api_key="sk-...")

# Gemini
gemini = LLMClient(provider="google", model="gemini-pro", api_key="...")

# Llama (local)
llama = LLMClient(provider="ollama", model="llama2", base_url="http://localhost:11434")

# All use the same API!
for client in [claude, gpt4, gemini, llama]:
    response = client.chat("Hello!")
    print(f"{client.config.provider}: {response.content}")

Streaming

client = LLMClient(provider="anthropic", model="claude-opus", api_key="...")

def on_chunk(chunk):
    print(chunk, end="", flush=True)

response = client.stream("Write a poem about AI", on_chunk=on_chunk)
print(f"\n\nTotal cost: ${response.usage.cost_usd}")

Async

import asyncio
from socrates_nexus import AsyncLLMClient

async def main():
    client = AsyncLLMClient(
        provider="anthropic",
        model="claude-opus",
        api_key="..."
    )

    # Concurrent requests
    responses = await asyncio.gather(
        client.chat("Query 1"),
        client.chat("Query 2"),
        client.chat("Query 3"),
    )

    for response in responses:
        print(response.content)

asyncio.run(main())

Configuration

Common Configuration Options

from socrates_nexus import LLMClient, LLMConfig

config = LLMConfig(
    # Provider and model
    provider="anthropic",
    model="claude-opus",
    api_key="sk-ant-...",

    # Retry behavior
    retry_attempts=3,
    retry_backoff_factor=2.0,
    request_timeout=60,

    # Response caching
    cache_responses=True,
    cache_ttl=300,  # 5 minutes

    # Optional
    temperature=0.7,
    max_tokens=1024,
)

client = LLMClient(config=config)

Environment Variables

Socrates Nexus automatically reads these if config not provided:

ANTHROPIC_API_KEY - Anthropic Claude
OPENAI_API_KEY - OpenAI GPT
GOOGLE_API_KEY - Google Gemini
ANTHROPIC_BASE_URL - Custom Anthropic endpoint
OPENAI_BASE_URL - Custom OpenAI endpoint

Error Handling

Socrates Nexus provides specific exception types for programmatic error handling:

from socrates_nexus import (
    NexusError,
    RateLimitError,
    AuthenticationError,
    InvalidAPIKeyError,
    TimeoutError,
    ContextLengthExceededError,
    ModelNotFoundError,
)

try:
    response = client.chat("Query")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except AuthenticationError as e:
    print(f"Auth failed: {e.message}")
except ContextLengthExceededError as e:
    print(f"Input too long: {e.message}")
except NexusError as e:
    print(f"LLM Error ({e.error_code}): {e.message}")

All exceptions include:

message - Human-readable error description
error_code - Machine-readable error code
context - Dict with provider-specific details

Key Features

1. Automatic Retries

Handles transient failures automatically:

Rate limits (HTTP 429)
Timeout errors
Temporary server errors (5xx)
Exponential backoff with jitter

client = LLMClient(
    provider="anthropic",
    model="claude-opus",
    api_key="...",
    retry_attempts=3,           # Number of retries
    retry_backoff_factor=2.0,   # Exponential backoff multiplier
)

2. Token Tracking

Track usage and costs across all providers:

response = client.chat("Query")

print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Total cost: ${response.usage.cost_usd}")

# Get cumulative stats
stats = client.get_usage_stats()
print(f"Total spent: ${stats.total_cost_usd}")

3. Multi-LLM Fallback & Resilience

Build resilient applications with multiple fallback strategies:

Sequential Fallback - Try providers in order:

def safe_chat(message: str):
    providers = [
        {"provider": "anthropic", "model": "claude-opus", "api_key": "..."},
        {"provider": "openai", "model": "gpt-4", "api_key": "..."},
        {"provider": "google", "model": "gemini-pro", "api_key": "..."},
    ]

    for config in providers:
        try:
            client = LLMClient(**config)
            return client.chat(message)
        except Exception:
            continue
    raise Exception("All providers failed")

Parallel Fallback - Try all at once, use first successful:

import asyncio
from socrates_nexus import AsyncLLMClient

async def parallel_fallback(message: str):
    clients = [
        AsyncLLMClient(provider="anthropic", model="claude-opus", api_key="..."),
        AsyncLLMClient(provider="openai", model="gpt-4", api_key="..."),
    ]

    results = await asyncio.gather(
        *[c.chat(message) for c in clients],
        return_exceptions=True
    )

    for result in results:
        if not isinstance(result, Exception):
            return result

4. Token Usage Tracking

Real-time cost tracking with provider breakdowns:

client = LLMClient(provider="anthropic", model="claude-opus", api_key="...")

# Track per-request
response = client.chat("What is Python?")
print(f"This request cost: ${response.usage.cost_usd:.6f}")
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

# Track cumulative usage
stats = client.get_usage_stats()
print(f"Total spent across all requests: ${stats.total_cost_usd:.2f}")
print(f"Total requests: {stats.total_requests}")

# Per-provider breakdown
for provider, p_stats in stats.by_provider.items():
    print(f"{provider}: {p_stats['requests']} requests, ${p_stats['cost_usd']:.2f}")

# Custom tracking callbacks
def log_expensive_requests(usage):
    if usage.cost_usd > 0.01:
        print(f"Expensive request: ${usage.cost_usd:.6f}")

client.add_usage_callback(log_expensive_requests)

5. Response Caching

Cache identical requests to save cost and time:

client = LLMClient(
    provider="anthropic",
    model="claude-opus",
    api_key="...",
    cache_responses=True,
    cache_ttl=300,  # 5 minutes
)

# First call: hits API
response1 = client.chat("What is Python?")

# Second call within 5 min: uses cache (instant)
response2 = client.chat("What is Python?")

print(f"Saved: ${response1.usage.cost_usd * 0.9}")

Supported Providers

Provider	Models	API Key	Status
Anthropic	Claude 3 (Opus, Sonnet, Haiku), Claude 3.5 Sonnet	Required	✅ Full
OpenAI	GPT-4, GPT-4o, GPT-3.5-turbo	Required	✅ Full
Google	Gemini 1.5 Pro, Gemini 1.5 Flash	Required	✅ Full
Ollama	Llama 2, Mistral, Neural Chat, Orca (local)	Not required	✅ Full

Setup Each Provider

Anthropic Claude:

export ANTHROPIC_API_KEY="sk-ant-..."

OpenAI GPT:

export OPENAI_API_KEY="sk-..."

Google Gemini:

export GOOGLE_API_KEY="..."

Ollama (Local):

# Install Ollama: https://ollama.ai
ollama pull llama2
ollama serve  # Starts on http://localhost:11434
# No API key needed!

Examples

See the examples/ directory for complete, runnable examples:

01_anthropic_basic.py - Basic Claude usage, token tracking, and cost calculation
02_openai_gpt4.py - OpenAI GPT-4 usage with streaming
03_google_gemini.py - Google Gemini basic and streaming calls
04_ollama_local.py - Local LLM with Ollama (no API key required)
05_streaming.py - Streaming patterns: real-time output, chunk accumulation, progress tracking
06_async_calls.py - Async/await, concurrent requests, multi-provider parallel execution
07_token_tracking.py - Usage statistics, cost monitoring, per-provider breakdowns
08_error_handling.py - Error types, safe error catching, automatic retry behavior
09_provider_fallback.py - Provider fallback strategies: sequential, parallel, cost-optimized, model escalation

Documentation

Quick Start - Get started in 5 minutes
Providers Guide - Setup for each LLM provider
API Reference - Complete API documentation
Advanced Usage - Caching, fallbacks, monitoring
Comparisons - vs raw SDKs

Development

Setup

# Clone repo
git clone https://github.com/Nireus79/socrates-nexus.git
cd socrates-nexus

# Install dev dependencies
pip install -e ".[dev,all]"

# Run tests
pytest tests/ -v

# Format code
black src/ tests/
ruff check src/ tests/

Testing

# All tests
pytest tests/ -v

# Only fast tests
pytest tests/ -v -m "not slow"

# With coverage
pytest tests/ --cov=socrates_nexus --cov-report=html

Contributing

Contributions welcome! Please:

Fork the repo
Create a feature branch
Add tests for new features
Submit a pull request

License

MIT License - see LICENSE

Origins

Socrates Nexus is extracted from Socrates AI, a collaborative AI platform. It's battle-tested in production and used for orchestrating multiple LLMs.

Roadmap

Phase 1: Foundation (Days 1-14) ✅ Complete

✅ Base client structure (sync + async)
✅ Provider implementations (Claude, GPT-4, Gemini, Ollama)
✅ Streaming support for all providers
✅ Automatic retry logic with exponential backoff
✅ Token tracking and cost calculation
✅ Response caching (TTL-based)
✅ Multi-provider fallback patterns
✅ 9 comprehensive examples
✅ Error handling with specific exception types

Phase 2: Enhancement (Days 15-21) 🔄 In Progress

🔄 Unit tests (75%+ coverage target)
🔄 Integration tests
⏳ Vision models support
⏳ Function calling for all providers
⏳ Batch processing API

Phase 3: Production (Days 22+) ⏳ Planned

⏳ Monitoring and observability
⏳ Rate limit optimization
⏳ Extended model support (Cohere, Replicate, etc.)
⏳ GitHub Actions CI/CD
⏳ PyPI publishing

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Sponsor: GitHub Sponsors

Made with ❤️ as part of the Socrates ecosystem

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Mar 30, 2026

0.3.0

Mar 10, 2026

0.2.0

Mar 9, 2026

This version

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socrates_nexus-0.1.0.tar.gz (24.7 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

socrates_nexus-0.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file socrates_nexus-0.1.0.tar.gz.

File metadata

Download URL: socrates_nexus-0.1.0.tar.gz
Upload date: Mar 9, 2026
Size: 24.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for socrates_nexus-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b55fbf1b25465c57a66ee6fa14bd37e2b7e99f45c6ce167fa190a2e019554d75`
MD5	`3c516515eed55b165bf70f8519fd95da`
BLAKE2b-256	`0cddf61bf730123c1b5ed6337840e96024ad5642f4c4225a7c6d1599150cef2a`

See more details on using hashes here.

File details

Details for the file socrates_nexus-0.1.0-py3-none-any.whl.

File metadata

Download URL: socrates_nexus-0.1.0-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 15.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for socrates_nexus-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b14abedc4c56994813b4acf3829f99ae60bb029aff8bcf11fcfdc67f0f7ac55`
MD5	`dd4b76755f1518043905f4e1886f5a38`
BLAKE2b-256	`d305d16fcae1b04c8b490c46b68fa5394a11753219b17aa576243acbc96eb2ce`

See more details on using hashes here.

socrates-nexus 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Socrates Nexus

Why Socrates Nexus?

Quick Start

Installation

Basic Usage

Multiple Providers (Same API)

Streaming

Async

Configuration

Common Configuration Options

Environment Variables

Error Handling

Key Features

1. Automatic Retries

2. Token Tracking

3. Multi-LLM Fallback & Resilience

4. Token Usage Tracking

5. Response Caching

Supported Providers

Setup Each Provider

Examples

Documentation

Development

Setup

Testing

Contributing

License

Origins

Roadmap

Phase 1: Foundation (Days 1-14) ✅ Complete

Phase 2: Enhancement (Days 15-21) 🔄 In Progress

Phase 3: Production (Days 22+) ⏳ Planned

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes