Skip to main content

UltraGPT: A modular multi-provider AI library for advanced reasoning and step pipelines with OpenAI and Claude support

Project description

🤖 UltraGPT

UltraGPT Cover

The "Write Once, Run Everywhere" AI library that handles ALL the heavy lifting

Python License OpenRouter


🎯 Why UltraGPT?

This is NOT just another LangChain wrapper. UltraGPT is a battle-tested, production-grade abstraction that solves the real problems developers face when building AI applications:

The Problems We Solve

Problem What Others Do What UltraGPT Does
Message Format Hell Forces you to convert between formats Auto-converts ANY format to LangChain, OpenAI, or provider-specific
Tool Call Orphans Crashes when tool results are missing Sanitizes history, removes orphans, validates pairing
Token Limits Crashes on overflow Smart truncation with atomic tool-call grouping
Provider Quirks One provider = one codebase True "write once, run everywhere" across ALL models
Structured Output Breaks on Claude/Gemini Universal schema support via tool-based fallback
Reasoning Models Manual complexity Auto-detects native thinking, preserves reasoning_details
Rate Limits Crashes your app Built-in exponential backoff with jitter
Streaming Issues Connection pool leaks Proper cleanup, mid-stream error detection
Schema Validation 400 errors everywhere Auto-sanitizes Pydantic → OpenAI strict mode

✨ Key Features

🌐 Universal Model Access via OpenRouter

One API key, ALL models:

  • GPT-5 (400k context, reasoning tokens)
  • Claude Sonnet 4.5 (1M extended context!)
  • Claude Opus 4 (200k context)
  • Gemini 3 Pro/Flash (1M context, reasoning)
  • Grok 4 (256k context, always-on reasoning)
  • Llama 3.3, DeepSeek v3.2, Mistral, and more

🧠 Native Thinking/Reasoning Support

  • Auto-detects models with native reasoning (Claude, o-series, GPT-5, Gemini 3)
  • Preserves reasoning_details across tool call loops
  • Falls back to simulated reasoning pipeline for non-reasoning models
  • Full token breakdown: input, output, reasoning tokens

🛠️ Production-Grade Tool Calling

  • Universal tool format that works across ALL providers
  • Automatic schema sanitization for strict mode compliance
  • Preserves reasoning context for multi-turn tool conversations
  • Parallel and sequential tool call support

📊 Structured Output That Actually Works

  • Pydantic schemas → provider-compatible JSON
  • Tool-based fallback for providers without native support
  • Handles Optional fields, nested objects, arrays
  • No more 400 errors from schema validation

💾 Intelligent Token Management

  • Auto-truncation with model-specific limits
  • Atomic tool-call grouping (never orphan a tool result)
  • Preserves system messages during truncation
  • Configurable: "AUTO", "OFF", or specific token count

🔄 Message History Sanitization

  • Removes orphaned tool results automatically
  • Drops unresolved tool calls before API submission
  • Consolidates multiple system messages safely
  • Strips whitespace (Claude is strict about this!)

🔧 LangChain Patches for OpenRouter

  • Preserves reasoning_details, cache_control, thinking fields
  • Future-proof: unknown fields pass through automatically
  • Works with streaming and non-streaming responses

🆕 What's New in v7.6.0

  • OpenAI streaming compatibility hardening: structured-output calls now gracefully recover from known client-side streaming serialization failures.
  • Model fallback chains: optional fallback_models support is now threaded through chat, schema, and tool-call paths.
  • Strict schema-fallback behavior: tool-based structured-output fallbacks now raise explicit parse errors instead of fabricating empty schema objects.
  • Packaging alignment: Python support is now documented and packaged as 3.10+, aligned with current LangChain dependency minimums.

📦 Installation

pip install ultragpt

Requirements


🚀 Quick Start

Basic Chat

from ultragpt import UltraGPT

# Initialize with OpenRouter (universal access)
ultra = UltraGPT(openrouter_api_key="your-openrouter-key")

# Simple chat - works with ANY model
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Explain quantum computing in 3 sentences."}],
    model="gpt-5"  # or "claude:sonnet", "gemini", etc.
)

print(response)
print(f"Tokens used: {tokens}")

Model Selection (Friendly Names)

# GPT models
ultra.chat(messages=[...], model="gpt-5")
ultra.chat(messages=[...], model="gpt-5-pro")
ultra.chat(messages=[...], model="gpt-4o")

# Claude models (extended 1M context for Sonnet!)
ultra.chat(messages=[...], model="claude:sonnet")  # Claude 3.7 Sonnet
ultra.chat(messages=[...], model="claude:opus")    # Claude Opus 4
ultra.chat(messages=[...], model="claude-sonnet-4.5")  # Latest Sonnet

# Gemini models
ultra.chat(messages=[...], model="gemini")  # Gemini 3 Pro
ultra.chat(messages=[...], model="gemini-3-flash")

# Other models
ultra.chat(messages=[...], model="grok")  # Grok 4
ultra.chat(messages=[...], model="deepseek")  # DeepSeek v3.2
ultra.chat(messages=[...], model="llama-3.3")

🧠 Native Thinking/Reasoning

UltraGPT automatically detects and uses native reasoning for supported models:

# Native reasoning is auto-enabled for Claude, GPT-5, o-series, Gemini 3
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Solve: If 3x + 7 = 22, find x"}],
    model="claude:sonnet",
    reasoning_pipeline=True,  # Triggers native thinking on supported models
)

# Access reasoning tokens and text
print(f"Reasoning tokens: {details.get('reasoning_tokens_api', 0)}")
print(f"Reasoning text: {details.get('reasoning_text')}")
print(f"Full details: {details.get('reasoning_details')}")

Fake Reasoning Pipeline (for non-reasoning models)

# For models without native reasoning (like GPT-4o), a simulated pipeline runs
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Plan a trip to Japan"}],
    model="gpt-4o",
    reasoning_pipeline=True,
    reasoning_iterations=3,
)

# Get the thoughts generated
print(f"Reasoning thoughts: {details.get('reasoning')}")

📊 Structured Output

Using Pydantic Schemas

from pydantic import BaseModel, Field

class SentimentAnalysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(description="0.0 to 1.0")
    keywords: list[str] = Field(description="Key words from the text")

response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Analyze: 'I absolutely love this product!'"}],
    model="gpt-5",
    schema=SentimentAnalysis,
)

print(response)
# {'sentiment': 'positive', 'confidence': 0.95, 'keywords': ['love', 'absolutely', 'product']}

Works Across ALL Providers

# Same schema works with Claude (uses tool-based fallback automatically)
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Analyze: 'This is terrible!'"}],
    model="claude:sonnet",
    schema=SentimentAnalysis,
)
# Still works! UltraGPT handles the differences automatically.

🛠️ Tool Calling

Define Custom Tools

from pydantic import BaseModel

class CalculatorParams(BaseModel):
    operation: str  # add, subtract, multiply, divide
    a: float
    b: float

calculator_tool = {
    "name": "calculator",
    "description": "Performs arithmetic calculations",
    "parameters_schema": CalculatorParams,
    "usage_guide": "Use for precise arithmetic calculations",
    "when_to_use": "When user needs numeric computation",
}

# Make a tool call
response, tokens, details = ultra.tool_call(
    messages=[{"role": "user", "content": "Calculate 25 * 8"}],
    user_tools=[calculator_tool],
    model="claude:sonnet",
)

print(response)
# [{'id': 'call_xxx', 'type': 'function', 'function': {'name': 'calculator', 'arguments': '{"operation": "multiply", "a": 25, "b": 8}'}}]

Parallel Tool Calls

# Allow multiple tools in single response
response, tokens, details = ultra.tool_call(
    messages=[{"role": "user", "content": "Add 10+5 and multiply 3*7"}],
    user_tools=[calculator_tool],
    allow_multiple=True,  # Returns array of tool calls
    model="gpt-5",
)

Tool Calling with Native Reasoning

# Reasoning models preserve context across tool loops
response, tokens, details = ultra.tool_call(
    messages=[{"role": "user", "content": "Use calculator to find 25 * 8"}],
    user_tools=[calculator_tool],
    model="claude:sonnet",
    reasoning_pipeline=True,  # Uses native thinking
)

# reasoning_details preserved for next turn
print(details.get("reasoning_details"))

Tool Call Return Format

tool_call() returns (response, tokens, details_dict). The response value depends on what the LLM produces:

LLM Output allow_multiple=True allow_multiple=False
Tool calls only (most common) list of tool call dicts Single tool call dict
Tool calls + text content {"tool_calls": [...], "content": "text"} {"tool_calls": [...], "content": "text"}
Text only (no tool calls) {"content": "text"} {"content": "text"}
Empty response {"content": ""} {"content": ""}

Key guarantees:

  • tool_call() never returns None - always a list or dict
  • LLM text content is never dropped - even when returned alongside tool calls
  • Multi-modal content blocks (list-type) are automatically normalized to plain strings
  • details_dict["reasoning_details"] is always available separately (3rd return value)

Handling all cases:

response, tokens, details = ultra.tool_call(messages=msgs, user_tools=tools, allow_multiple=True)

if isinstance(response, list):
    # Tool calls only (backward-compatible array)
    for tc in response:
        name = tc["function"]["name"]
        args = tc["function"]["arguments"]

elif isinstance(response, dict):
    if "tool_calls" in response:
        # Mixed: tool calls + accompanying text from LLM
        tool_calls = response["tool_calls"]
        llm_text = response.get("content", "")  # "Let me check that..."
    else:
        # Text only (no tool calls)
        text = response.get("content", "")

📝 Pipelines

Steps Pipeline

Break complex tasks into manageable steps:

response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Plan a 2-week trip to Japan"}],
    model="gpt-5",
    steps_pipeline=True,
    steps_model="gpt-5.4-nano",  # Use cheaper model for planning
)

print(f"Steps: {details.get('steps')}")
print(f"Conclusion: {response}")

Reasoning Pipeline

Multi-iteration deep thinking:

response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "What are the long-term implications of AI on employment?"}],
    model="gpt-4o",
    reasoning_pipeline=True,
    reasoning_iterations=5,
    reasoning_model="gpt-4o-mini",  # Use cheaper model for iterations
)

print(f"Thoughts generated: {len(details.get('reasoning', []))}")

💾 Token Management

Automatic Truncation

ultra = UltraGPT(
    openrouter_api_key="...",
    input_truncation="AUTO",  # Automatically fits model's context limit
)

# Or specify exact limit
ultra = UltraGPT(
    openrouter_api_key="...",
    input_truncation=50000,  # Max 50k tokens
)

# Or disable
ultra = UltraGPT(
    openrouter_api_key="...",
    input_truncation="OFF",
)

How Truncation Works

  1. Groups tool calls with their results (never orphans)
  2. Preserves system messages
  3. Removes oldest messages first (keeps newest)
  4. Ensures at least one HumanMessage remains

🌐 Web Search (Built-in Tool)

ultra = UltraGPT(
    openrouter_api_key="...",
    google_api_key="your-google-api-key",
    search_engine_id="your-search-engine-id",
)

response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "What are the latest AI trends in 2026?"}],
    model="gpt-5",
    tools=["web-search"],
    tools_config={
        "web-search": {
            "max_results": 3,
            "enable_scraping": True,
            "max_scrape_length": 5000,
        }
    },
)

🔧 Advanced Configuration

Full Initialization Options

ultra = UltraGPT(
    # API Keys
    openrouter_api_key="...",  # Required: Universal access to all models
    google_api_key="...",      # Optional: For web search
    search_engine_id="...",    # Optional: For web search
    
    # Token Management
    max_tokens=4096,           # Max output tokens
    input_truncation="AUTO",   # "AUTO", "OFF", or int

    # Model Fallbacks (Optional)
    # None or [] = disabled
    # Example: ["openai/gpt-4.1", "claude-sonnet-4.5"]
    fallback_models=None,
    
    # Logging
    verbose=True,              # Show detailed logs
    logger_name="ultragpt",
    log_to_file=False,
    log_to_console=True,
    log_level="DEBUG",
)

Chat Method Full Signature

response, tokens, details = ultra.chat(
    messages=[...],
    
    # Model Selection
    model="gpt-5",              # Model to use
    temperature=0.7,            # Creativity (0-1)
    max_tokens=4096,            # Max output tokens
    
    # Structured Output
    schema=MyPydanticSchema,    # Optional: Force structured response
    
    # Pipelines
    steps_pipeline=False,       # Enable step-by-step planning
    reasoning_pipeline=False,   # Enable multi-iteration reasoning
    steps_model="gpt-5.4-nano",   # Model for steps
    reasoning_model="gpt-5.4-nano", # Model for reasoning
    reasoning_iterations=3,     # Reasoning depth
    
    # Tools
    tools=["web-search"],       # Built-in tools
    tools_config={...},         # Tool configuration
    
    # Token Management
    input_truncation="AUTO",    # Override instance setting

    # Optional per-call fallback chain
    # None uses instance default, [] disables for this call
    fallback_models=["openai/gpt-4.1"],
)

Model Fallbacks

# Fallbacks disabled by default
ultra = UltraGPT(openrouter_api_key="...")

# Set defaults at initialization
ultra_with_fallbacks = UltraGPT(
    openrouter_api_key="...",
    fallback_models=["openai/gpt-4.1", "claude-sonnet-4.5"],
)

# Disable fallback for one critical call
response, tokens, details = ultra_with_fallbacks.chat(
    messages=[{"role": "user", "content": "Summarize this contract."}],
    model="gpt-5",
    fallback_models=[],
)

print(details.get("selected_model"))
print(details.get("fallback_used"))
print(details.get("attempted_models"))

📊 Response Details

Every call returns (response, tokens, details):

response, tokens, details = ultra.chat(...)

# Token breakdown
print(f"Input tokens: {details.get('input_tokens')}")
print(f"Output tokens: {details.get('output_tokens')}")
print(f"Total tokens: {details.get('total_tokens')}")
print(f"Reasoning tokens: {details.get('reasoning_tokens_api')}")

# Pipeline metrics (if used)
print(f"Reasoning pipeline tokens: {details.get('reasoning_pipeline_total_tokens')}")
print(f"Steps pipeline tokens: {details.get('steps_pipeline_total_tokens')}")

# Reasoning content (for reasoning models)
print(f"Reasoning text: {details.get('reasoning_text')}")
print(f"Reasoning details: {details.get('reasoning_details')}")

# Tools used
print(f"Tools called: {details.get('tools_used')}")

# Fallback metadata (when fallback_models is configured)
print(f"Selected model: {details.get('selected_model')}")
print(f"Fallback used: {details.get('fallback_used')}")
print(f"Attempted models: {details.get('attempted_models')}")
print(f"Fallback failures: {details.get('fallback_failures')}")

🛡️ Production Features

Rate Limit Handling

# Built-in exponential backoff with jitter
# Configurable in config/config.py:
# - RATE_LIMIT_RETRIES = 5
# - RATE_LIMIT_BASE_DELAY = 10
# - RATE_LIMIT_MAX_DELAY = 60
# - RATE_LIMIT_BACKOFF_MULTIPLIER = 2

Stream Timeout Protection

# Streams have wall-clock deadlines (default 1 hour)
# Prevents infinite hanging on stalled connections
# Proper cleanup prevents connection pool leaks

Message Sanitization

# Before each API call, UltraGPT:
# 1. Removes orphaned tool results
# 2. Drops unresolved tool calls
# 3. Consolidates system messages
# 4. Strips trailing whitespace (Claude requirement)
# 5. Validates tool call pairing

Schema Sanitization

# Pydantic schemas are automatically transformed:
# 1. anyOf/Optional patterns → direct types
# 2. additionalProperties: false added
# 3. required arrays completed
# 4. "default" keywords stripped (causes 400s)
# 5. Nested objects recursively fixed

🔌 Using the LLM Directly

Need the raw LangChain ChatOpenAI instance?

# Get the underlying LLM for custom operations
llm = ultra.provider_manager.get_provider("openrouter").build_llm(
    model="gpt-5",
    temperature=0.7,
    max_tokens=4096,
)

# Use directly with LangChain
response = llm.invoke([...])

📁 Project Structure

ultragpt/
├── core/
│   ├── core.py          # Main UltraGPT orchestrator
│   ├── chat_flow.py     # Chat operations
│   └── pipelines.py     # Reasoning & Steps pipelines
├── providers/
│   ├── providers.py     # OpenRouter provider
│   └── _langchain_patches.py  # Field preservation patches
├── messaging/
│   ├── message_ops.py   # Message consolidation
│   ├── history_utils.py # Orphan removal, validation
│   ├── token_manager.py # Message normalization
│   └── token_limits/
│       └── langchain_limiter.py  # Smart truncation
├── schemas/
│   ├── schema_utils.py  # Pydantic → OpenAI conversion
│   └── tool_schemas.py  # Tool/ExpertTool definitions
├── tooling/
│   └── tools_manager.py # Tool loading & execution
├── tools/
│   ├── web_search/      # Google search & scraping
│   ├── calculator/      # Basic calculator
│   └── math_operations/ # Advanced math
├── prompts/
│   └── prompts.py       # Pipeline prompts
└── config/
    └── config.py        # Default settings

� Running Tests

UltraGPT doesn't include tests in the package, but here are essential verification scripts you should run:

Basic Functionality Test

from ultragpt import UltraGPT
import os

# Initialize
ultra = UltraGPT(openrouter_api_key=os.getenv("OPENROUTER_API_KEY"))

# Test 1: Basic chat
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "What is 2+2?"}],
    model="gpt-5"
)
print(f"✓ Basic chat: {response} (tokens: {tokens})")

# Test 2: Structured output
from pydantic import BaseModel
class Answer(BaseModel):
    result: int
    explanation: str

response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "What is 5*8? Explain."}],
    model="gpt-5",
    schema=Answer,
)
print(f"✓ Structured output: {response}")

# Test 3: Tool calling
calculator = {
    "name": "calculator",
    "description": "Performs arithmetic",
    "parameters_schema": {
        "type": "object",
        "properties": {
            "operation": {"type": "string", "enum": ["add", "multiply"]},
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["operation", "a", "b"]
    },
    "usage_guide": "Use for calculations",
    "when_to_use": "When user needs math",
}

response, tokens, details = ultra.tool_call(
    messages=[{"role": "user", "content": "Calculate 25 * 8"}],
    user_tools=[calculator],
    model="gpt-5",
)
print(f"✓ Tool calling: {response}")

Native Thinking Test

# Test with reasoning model
response, tokens, details = ultra.chat(
    messages=[{"role": "user", "content": "Solve step by step: If 3x + 7 = 22, find x"}],
    model="claude:sonnet",
    reasoning_pipeline=True,  # Auto-detects native thinking
)

print(f"Response: {response}")
print(f"Reasoning tokens: {details.get('reasoning_tokens_api', 0)}")
print(f"Has reasoning: {'reasoning_text' in details}")

Multi-Provider Test

# Test different providers with same code
for model in ["gpt-5", "claude:sonnet", "gemini"]:
    response, tokens, details = ultra.chat(
        messages=[{"role": "user", "content": "Say hello"}],
        model=model,
    )
    print(f"✓ {model}: {response} ({tokens} tokens)")

Message Sanitization Test

# Test orphan removal (shouldn't crash)
from langchain_core.messages import AIMessage, ToolMessage

messages = [
    {"role": "user", "content": "Hello"},
    AIMessage(content="", tool_calls=[{"id": "call_123", "name": "test", "args": {}}]),
    # Missing tool result - should be sanitized automatically
]

response, tokens, details = ultra.chat(
    messages=messages,
    model="gpt-5",
)
print("✓ Orphan tool calls handled gracefully")

�🤝 Contributing

Contributions welcome! Please ensure:

  1. All tests pass
  2. Code follows existing patterns
  3. Documentation updated for new features

📄 License

MIT License - see LICENSE.rst


🙏 Acknowledgments

Built on top of LangChain with patches for OpenRouter compatibility.

Powered by OpenRouter for universal model access.


UltraGPT: Stop fighting with AI providers. Start building.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultragpt-7.7.1.tar.gz (123.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ultragpt-7.7.1-py3-none-any.whl (81.5 kB view details)

Uploaded Python 3

File details

Details for the file ultragpt-7.7.1.tar.gz.

File metadata

  • Download URL: ultragpt-7.7.1.tar.gz
  • Upload date:
  • Size: 123.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultragpt-7.7.1.tar.gz
Algorithm Hash digest
SHA256 2aa299378551b7d2171d781eff1aa31de85fcad0d789487c095fd43dc7b7eae6
MD5 7542de6012ae5cbd07ddfe6d16e8a5dd
BLAKE2b-256 fb275f64216069dc9c55ddc6392a63b6e603f21705c88be0a85c018dcbe90895

See more details on using hashes here.

File details

Details for the file ultragpt-7.7.1-py3-none-any.whl.

File metadata

  • Download URL: ultragpt-7.7.1-py3-none-any.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for ultragpt-7.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0c13dbc3d5001535f285ed9c33895de5dcb81180025040fd509efe7b2f225da1
MD5 e85a2721227f944e3311f3aa9aded529
BLAKE2b-256 e0e944e2f8c16544d39d8d58824943e9d70fece84a58c0869ce1ca6e6c26a97f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page