UltraGPT: A modular multi-provider AI library for advanced reasoning and step pipelines with OpenAI and Claude support
Project description
🤖 UltraGPT
The "Write Once, Run Everywhere" AI library that handles ALL the heavy lifting
🎯 Why UltraGPT?
This is NOT just another LangChain wrapper. UltraGPT is a battle-tested, production-grade abstraction that solves the real problems developers face when building AI applications:
The Problems We Solve
| Problem | What Others Do | What UltraGPT Does |
|---|---|---|
| Message Format Hell | Forces you to convert between formats | Auto-converts ANY format to LangChain, OpenAI, or provider-specific |
| Tool Call Orphans | Crashes when tool results are missing | Sanitizes history, removes orphans, validates pairing |
| Token Limits | Crashes on overflow | Smart truncation with atomic tool-call grouping |
| Provider Quirks | One provider = one codebase | True "write once, run everywhere" across ALL models |
| Structured Output | Breaks on Claude/Gemini | Universal schema support via tool-based fallback |
| Reasoning Models | Manual complexity | Auto-detects native thinking, preserves reasoning_details |
| Rate Limits | Crashes your app | Built-in exponential backoff with jitter |
| Streaming Issues | Connection pool leaks | Proper cleanup, mid-stream error detection |
| Schema Validation | 400 errors everywhere | Auto-sanitizes Pydantic → OpenAI strict mode |
✨ Key Features
🌐 Universal Model Access via OpenRouter
One API key, ALL models:
- GPT-5 (400k context, reasoning tokens)
- Claude Sonnet 4.5 (1M extended context!)
- Claude Opus 4 (200k context)
- Gemini 3 Pro/Flash (1M context, reasoning)
- Grok 4 (256k context, always-on reasoning)
- Llama 3.3, DeepSeek v3.2, Mistral, and more
🧠 Native Thinking/Reasoning Support
- Auto-detects models with native reasoning (Claude, o-series, GPT-5, Gemini 3)
- Preserves
reasoning_detailsacross tool call loops - Falls back to simulated reasoning pipeline for non-reasoning models
- Full token breakdown: input, output, reasoning tokens
🛠️ Production-Grade Tool Calling
- Universal tool format that works across ALL providers
- Automatic schema sanitization for strict mode compliance
- Preserves reasoning context for multi-turn tool conversations
- Parallel and sequential tool call support
📊 Structured Output That Actually Works
- Pydantic schemas → provider-compatible JSON
- Tool-based fallback for providers without native support
- Handles
Optionalfields, nested objects, arrays - No more 400 errors from schema validation
💾 Intelligent Token Management
- Auto-truncation with model-specific limits
- Atomic tool-call grouping (never orphan a tool result)
- Preserves system messages during truncation
- Configurable:
"AUTO","OFF", or specific token count
🔄 Message History Sanitization
- Removes orphaned tool results automatically
- Drops unresolved tool calls before API submission
- Consolidates multiple system messages safely
- Strips whitespace (Claude is strict about this!)
🔧 LangChain Patches for OpenRouter
- Preserves
reasoning_details,cache_control,thinkingfields - Future-proof: unknown fields pass through automatically
- Works with streaming and non-streaming responses
🆕 What's New in v7.6.0
- OpenAI streaming compatibility hardening: structured-output calls now gracefully recover from known client-side streaming serialization failures.
- Model fallback chains: optional
fallback_modelssupport is now threaded through chat, schema, and tool-call paths. - Strict schema-fallback behavior: tool-based structured-output fallbacks now raise explicit parse errors instead of fabricating empty schema objects.
- Packaging alignment: Python support is now documented and packaged as 3.10+, aligned with current LangChain dependency minimums.
📦 Installation
pip install ultragpt
Requirements
- Python 3.10+
- OpenRouter API key (get one at openrouter.ai/keys)
🚀 Quick Start
Basic Chat
from ultragpt import UltraGPT
# Initialize with OpenRouter (universal access)
ultra = UltraGPT(openrouter_api_key="your-openrouter-key")
# Simple chat - works with ANY model
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Explain quantum computing in 3 sentences."}],
model="gpt-5" # or "claude:sonnet", "gemini", etc.
)
print(response)
print(f"Tokens used: {tokens}")
Model Selection (Friendly Names)
# GPT models
ultra.chat(messages=[...], model="gpt-5")
ultra.chat(messages=[...], model="gpt-5-pro")
ultra.chat(messages=[...], model="gpt-4o")
# Claude models (extended 1M context for Sonnet!)
ultra.chat(messages=[...], model="claude:sonnet") # Claude 3.7 Sonnet
ultra.chat(messages=[...], model="claude:opus") # Claude Opus 4
ultra.chat(messages=[...], model="claude-sonnet-4.5") # Latest Sonnet
# Gemini models
ultra.chat(messages=[...], model="gemini") # Gemini 3 Pro
ultra.chat(messages=[...], model="gemini-3-flash")
# Other models
ultra.chat(messages=[...], model="grok") # Grok 4
ultra.chat(messages=[...], model="deepseek") # DeepSeek v3.2
ultra.chat(messages=[...], model="llama-3.3")
🧠 Native Thinking/Reasoning
UltraGPT automatically detects and uses native reasoning for supported models:
# Native reasoning is auto-enabled for Claude, GPT-5, o-series, Gemini 3
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Solve: If 3x + 7 = 22, find x"}],
model="claude:sonnet",
reasoning_pipeline=True, # Triggers native thinking on supported models
)
# Access reasoning tokens and text
print(f"Reasoning tokens: {details.get('reasoning_tokens_api', 0)}")
print(f"Reasoning text: {details.get('reasoning_text')}")
print(f"Full details: {details.get('reasoning_details')}")
Fake Reasoning Pipeline (for non-reasoning models)
# For models without native reasoning (like GPT-4o), a simulated pipeline runs
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Plan a trip to Japan"}],
model="gpt-4o",
reasoning_pipeline=True,
reasoning_iterations=3,
)
# Get the thoughts generated
print(f"Reasoning thoughts: {details.get('reasoning')}")
📊 Structured Output
Using Pydantic Schemas
from pydantic import BaseModel, Field
class SentimentAnalysis(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(description="0.0 to 1.0")
keywords: list[str] = Field(description="Key words from the text")
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Analyze: 'I absolutely love this product!'"}],
model="gpt-5",
schema=SentimentAnalysis,
)
print(response)
# {'sentiment': 'positive', 'confidence': 0.95, 'keywords': ['love', 'absolutely', 'product']}
Works Across ALL Providers
# Same schema works with Claude (uses tool-based fallback automatically)
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Analyze: 'This is terrible!'"}],
model="claude:sonnet",
schema=SentimentAnalysis,
)
# Still works! UltraGPT handles the differences automatically.
🛠️ Tool Calling
Define Custom Tools
from pydantic import BaseModel
class CalculatorParams(BaseModel):
operation: str # add, subtract, multiply, divide
a: float
b: float
calculator_tool = {
"name": "calculator",
"description": "Performs arithmetic calculations",
"parameters_schema": CalculatorParams,
"usage_guide": "Use for precise arithmetic calculations",
"when_to_use": "When user needs numeric computation",
}
# Make a tool call
response, tokens, details = ultra.tool_call(
messages=[{"role": "user", "content": "Calculate 25 * 8"}],
user_tools=[calculator_tool],
model="claude:sonnet",
)
print(response)
# [{'id': 'call_xxx', 'type': 'function', 'function': {'name': 'calculator', 'arguments': '{"operation": "multiply", "a": 25, "b": 8}'}}]
Parallel Tool Calls
# Allow multiple tools in single response
response, tokens, details = ultra.tool_call(
messages=[{"role": "user", "content": "Add 10+5 and multiply 3*7"}],
user_tools=[calculator_tool],
allow_multiple=True, # Returns array of tool calls
model="gpt-5",
)
Tool Calling with Native Reasoning
# Reasoning models preserve context across tool loops
response, tokens, details = ultra.tool_call(
messages=[{"role": "user", "content": "Use calculator to find 25 * 8"}],
user_tools=[calculator_tool],
model="claude:sonnet",
reasoning_pipeline=True, # Uses native thinking
)
# reasoning_details preserved for next turn
print(details.get("reasoning_details"))
Tool Call Return Format
tool_call() returns (response, tokens, details_dict). The response value depends on what the LLM produces:
| LLM Output | allow_multiple=True |
allow_multiple=False |
|---|---|---|
| Tool calls only (most common) | list of tool call dicts |
Single tool call dict |
| Tool calls + text content | {"tool_calls": [...], "content": "text"} |
{"tool_calls": [...], "content": "text"} |
| Text only (no tool calls) | {"content": "text"} |
{"content": "text"} |
| Empty response | {"content": ""} |
{"content": ""} |
Key guarantees:
tool_call()never returnsNone- always a list or dict- LLM text content is never dropped - even when returned alongside tool calls
- Multi-modal content blocks (list-type) are automatically normalized to plain strings
details_dict["reasoning_details"]is always available separately (3rd return value)
Handling all cases:
response, tokens, details = ultra.tool_call(messages=msgs, user_tools=tools, allow_multiple=True)
if isinstance(response, list):
# Tool calls only (backward-compatible array)
for tc in response:
name = tc["function"]["name"]
args = tc["function"]["arguments"]
elif isinstance(response, dict):
if "tool_calls" in response:
# Mixed: tool calls + accompanying text from LLM
tool_calls = response["tool_calls"]
llm_text = response.get("content", "") # "Let me check that..."
else:
# Text only (no tool calls)
text = response.get("content", "")
📝 Pipelines
Steps Pipeline
Break complex tasks into manageable steps:
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Plan a 2-week trip to Japan"}],
model="gpt-5",
steps_pipeline=True,
steps_model="gpt-5.4-nano", # Use cheaper model for planning
)
print(f"Steps: {details.get('steps')}")
print(f"Conclusion: {response}")
Reasoning Pipeline
Multi-iteration deep thinking:
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "What are the long-term implications of AI on employment?"}],
model="gpt-4o",
reasoning_pipeline=True,
reasoning_iterations=5,
reasoning_model="gpt-4o-mini", # Use cheaper model for iterations
)
print(f"Thoughts generated: {len(details.get('reasoning', []))}")
💾 Token Management
Automatic Truncation
ultra = UltraGPT(
openrouter_api_key="...",
input_truncation="AUTO", # Automatically fits model's context limit
)
# Or specify exact limit
ultra = UltraGPT(
openrouter_api_key="...",
input_truncation=50000, # Max 50k tokens
)
# Or disable
ultra = UltraGPT(
openrouter_api_key="...",
input_truncation="OFF",
)
How Truncation Works
- Groups tool calls with their results (never orphans)
- Preserves system messages
- Removes oldest messages first (keeps newest)
- Ensures at least one HumanMessage remains
🌐 Web Search (Built-in Tool)
ultra = UltraGPT(
openrouter_api_key="...",
google_api_key="your-google-api-key",
search_engine_id="your-search-engine-id",
)
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "What are the latest AI trends in 2026?"}],
model="gpt-5",
tools=["web-search"],
tools_config={
"web-search": {
"max_results": 3,
"enable_scraping": True,
"max_scrape_length": 5000,
}
},
)
🔧 Advanced Configuration
Full Initialization Options
ultra = UltraGPT(
# API Keys
openrouter_api_key="...", # Required: Universal access to all models
google_api_key="...", # Optional: For web search
search_engine_id="...", # Optional: For web search
# Token Management
max_tokens=4096, # Max output tokens
input_truncation="AUTO", # "AUTO", "OFF", or int
# Model Fallbacks (Optional)
# None or [] = disabled
# Example: ["openai/gpt-4.1", "claude-sonnet-4.5"]
fallback_models=None,
# Logging
verbose=True, # Show detailed logs
logger_name="ultragpt",
log_to_file=False,
log_to_console=True,
log_level="DEBUG",
)
Chat Method Full Signature
response, tokens, details = ultra.chat(
messages=[...],
# Model Selection
model="gpt-5", # Model to use
temperature=0.7, # Creativity (0-1)
max_tokens=4096, # Max output tokens
# Structured Output
schema=MyPydanticSchema, # Optional: Force structured response
# Pipelines
steps_pipeline=False, # Enable step-by-step planning
reasoning_pipeline=False, # Enable multi-iteration reasoning
steps_model="gpt-5.4-nano", # Model for steps
reasoning_model="gpt-5.4-nano", # Model for reasoning
reasoning_iterations=3, # Reasoning depth
# Tools
tools=["web-search"], # Built-in tools
tools_config={...}, # Tool configuration
# Token Management
input_truncation="AUTO", # Override instance setting
# Optional per-call fallback chain
# None uses instance default, [] disables for this call
fallback_models=["openai/gpt-4.1"],
)
Model Fallbacks
# Fallbacks disabled by default
ultra = UltraGPT(openrouter_api_key="...")
# Set defaults at initialization
ultra_with_fallbacks = UltraGPT(
openrouter_api_key="...",
fallback_models=["openai/gpt-4.1", "claude-sonnet-4.5"],
)
# Disable fallback for one critical call
response, tokens, details = ultra_with_fallbacks.chat(
messages=[{"role": "user", "content": "Summarize this contract."}],
model="gpt-5",
fallback_models=[],
)
print(details.get("selected_model"))
print(details.get("fallback_used"))
print(details.get("attempted_models"))
📊 Response Details
Every call returns (response, tokens, details):
response, tokens, details = ultra.chat(...)
# Token breakdown
print(f"Input tokens: {details.get('input_tokens')}")
print(f"Output tokens: {details.get('output_tokens')}")
print(f"Total tokens: {details.get('total_tokens')}")
print(f"Reasoning tokens: {details.get('reasoning_tokens_api')}")
# Pipeline metrics (if used)
print(f"Reasoning pipeline tokens: {details.get('reasoning_pipeline_total_tokens')}")
print(f"Steps pipeline tokens: {details.get('steps_pipeline_total_tokens')}")
# Reasoning content (for reasoning models)
print(f"Reasoning text: {details.get('reasoning_text')}")
print(f"Reasoning details: {details.get('reasoning_details')}")
# Tools used
print(f"Tools called: {details.get('tools_used')}")
# Fallback metadata (when fallback_models is configured)
print(f"Selected model: {details.get('selected_model')}")
print(f"Fallback used: {details.get('fallback_used')}")
print(f"Attempted models: {details.get('attempted_models')}")
print(f"Fallback failures: {details.get('fallback_failures')}")
🛡️ Production Features
Rate Limit Handling
# Built-in exponential backoff with jitter
# Configurable in config/config.py:
# - RATE_LIMIT_RETRIES = 5
# - RATE_LIMIT_BASE_DELAY = 10
# - RATE_LIMIT_MAX_DELAY = 60
# - RATE_LIMIT_BACKOFF_MULTIPLIER = 2
Stream Timeout Protection
# Streams have wall-clock deadlines (default 1 hour)
# Prevents infinite hanging on stalled connections
# Proper cleanup prevents connection pool leaks
Message Sanitization
# Before each API call, UltraGPT:
# 1. Removes orphaned tool results
# 2. Drops unresolved tool calls
# 3. Consolidates system messages
# 4. Strips trailing whitespace (Claude requirement)
# 5. Validates tool call pairing
Schema Sanitization
# Pydantic schemas are automatically transformed:
# 1. anyOf/Optional patterns → direct types
# 2. additionalProperties: false added
# 3. required arrays completed
# 4. "default" keywords stripped (causes 400s)
# 5. Nested objects recursively fixed
🔌 Using the LLM Directly
Need the raw LangChain ChatOpenAI instance?
# Get the underlying LLM for custom operations
llm = ultra.provider_manager.get_provider("openrouter").build_llm(
model="gpt-5",
temperature=0.7,
max_tokens=4096,
)
# Use directly with LangChain
response = llm.invoke([...])
📁 Project Structure
ultragpt/
├── core/
│ ├── core.py # Main UltraGPT orchestrator
│ ├── chat_flow.py # Chat operations
│ └── pipelines.py # Reasoning & Steps pipelines
├── providers/
│ ├── providers.py # OpenRouter provider
│ └── _langchain_patches.py # Field preservation patches
├── messaging/
│ ├── message_ops.py # Message consolidation
│ ├── history_utils.py # Orphan removal, validation
│ ├── token_manager.py # Message normalization
│ └── token_limits/
│ └── langchain_limiter.py # Smart truncation
├── schemas/
│ ├── schema_utils.py # Pydantic → OpenAI conversion
│ └── tool_schemas.py # Tool/ExpertTool definitions
├── tooling/
│ └── tools_manager.py # Tool loading & execution
├── tools/
│ ├── web_search/ # Google search & scraping
│ ├── calculator/ # Basic calculator
│ └── math_operations/ # Advanced math
├── prompts/
│ └── prompts.py # Pipeline prompts
└── config/
└── config.py # Default settings
� Running Tests
UltraGPT doesn't include tests in the package, but here are essential verification scripts you should run:
Basic Functionality Test
from ultragpt import UltraGPT
import os
# Initialize
ultra = UltraGPT(openrouter_api_key=os.getenv("OPENROUTER_API_KEY"))
# Test 1: Basic chat
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "What is 2+2?"}],
model="gpt-5"
)
print(f"✓ Basic chat: {response} (tokens: {tokens})")
# Test 2: Structured output
from pydantic import BaseModel
class Answer(BaseModel):
result: int
explanation: str
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "What is 5*8? Explain."}],
model="gpt-5",
schema=Answer,
)
print(f"✓ Structured output: {response}")
# Test 3: Tool calling
calculator = {
"name": "calculator",
"description": "Performs arithmetic",
"parameters_schema": {
"type": "object",
"properties": {
"operation": {"type": "string", "enum": ["add", "multiply"]},
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["operation", "a", "b"]
},
"usage_guide": "Use for calculations",
"when_to_use": "When user needs math",
}
response, tokens, details = ultra.tool_call(
messages=[{"role": "user", "content": "Calculate 25 * 8"}],
user_tools=[calculator],
model="gpt-5",
)
print(f"✓ Tool calling: {response}")
Native Thinking Test
# Test with reasoning model
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Solve step by step: If 3x + 7 = 22, find x"}],
model="claude:sonnet",
reasoning_pipeline=True, # Auto-detects native thinking
)
print(f"Response: {response}")
print(f"Reasoning tokens: {details.get('reasoning_tokens_api', 0)}")
print(f"Has reasoning: {'reasoning_text' in details}")
Multi-Provider Test
# Test different providers with same code
for model in ["gpt-5", "claude:sonnet", "gemini"]:
response, tokens, details = ultra.chat(
messages=[{"role": "user", "content": "Say hello"}],
model=model,
)
print(f"✓ {model}: {response} ({tokens} tokens)")
Message Sanitization Test
# Test orphan removal (shouldn't crash)
from langchain_core.messages import AIMessage, ToolMessage
messages = [
{"role": "user", "content": "Hello"},
AIMessage(content="", tool_calls=[{"id": "call_123", "name": "test", "args": {}}]),
# Missing tool result - should be sanitized automatically
]
response, tokens, details = ultra.chat(
messages=messages,
model="gpt-5",
)
print("✓ Orphan tool calls handled gracefully")
�🤝 Contributing
Contributions welcome! Please ensure:
- All tests pass
- Code follows existing patterns
- Documentation updated for new features
📄 License
MIT License - see LICENSE.rst
🙏 Acknowledgments
Built on top of LangChain with patches for OpenRouter compatibility.
Powered by OpenRouter for universal model access.
UltraGPT: Stop fighting with AI providers. Start building.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ultragpt-7.7.1.tar.gz.
File metadata
- Download URL: ultragpt-7.7.1.tar.gz
- Upload date:
- Size: 123.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2aa299378551b7d2171d781eff1aa31de85fcad0d789487c095fd43dc7b7eae6
|
|
| MD5 |
7542de6012ae5cbd07ddfe6d16e8a5dd
|
|
| BLAKE2b-256 |
fb275f64216069dc9c55ddc6392a63b6e603f21705c88be0a85c018dcbe90895
|
File details
Details for the file ultragpt-7.7.1-py3-none-any.whl.
File metadata
- Download URL: ultragpt-7.7.1-py3-none-any.whl
- Upload date:
- Size: 81.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c13dbc3d5001535f285ed9c33895de5dcb81180025040fd509efe7b2f225da1
|
|
| MD5 |
e85a2721227f944e3311f3aa9aded529
|
|
| BLAKE2b-256 |
e0e944e2f8c16544d39d8d58824943e9d70fece84a58c0869ce1ca6e6c26a97f
|