Unified multi-provider LLM abstraction module with intelligent routing, cost tracking, and caching
Project description
StratifyAI — Unified Multi‑Provider LLM Interface
Status: Production Ready — MCP Ecosystem Complete (Server + Client Engine + Abstraction Layer) Providers: 9 Operational Features: Routing • RAG • Caching • Streaming • Observability • Security Hardening • CLI • Svelte 5 SPA • Vision • Smart Chunking • Prompt Templates • O(1) Cache • Concurrency Limits • MCP Server & Client Engine
StratifyAI is a production‑ready Python framework that provides a unified interface for 9+ LLM providers, including OpenAI, Anthropic, Google, DeepSeek, Groq, Grok, OpenRouter, Ollama, and AWS Bedrock. It eliminates vendor lock‑in, simplifies multi‑model development, and enables intelligent routing, cost tracking, caching, streaming, and RAG workflows.
Start here: docs/GETTING-STARTED.md • Web UI guide: docs/UI-OVERVIEW.md • Examples: examples/README.md • Vision guide: docs/VISION-SUPPORT.md
Features
Core
- Unified API for 9+ LLM providers
- Async-first architecture with sync wrappers
- Automatic provider detection
- Cost tracking and budget enforcement
- Latency tracking on all responses
- Retry logic with fallback models
- Streaming support for all providers
- O(1) Cache with LRU eviction + provider prompt caching
- Concurrent read-write locking (RWLockFair) for high-throughput caching
- Provider concurrency limits (max concurrent requests per provider)
- Correlation IDs for HTTP/WebSocket tracing
- Provider health and structured metrics endpoints
- Intelligent routing (cost, quality, latency, hybrid)
- Capability filtering (vision, tools, reasoning)
- Model metadata and context window awareness
- Builder pattern for fluent configuration
- Vision support for image analysis (GPT-4o, Claude, Gemini, Nova)
- Prompt templates with 10 built-in templates (code review, summarization, translation, etc.)
- User-defined template support via
~/.stratifyai/prompts/ - MCP Server with 8 tools, 5 resources, 13+ prompts
- MCP Client Engine — spawn and manage external MCP servers, tool aggregation, chat integration
- MCP Abstraction Layer — curated server catalog, CLI wizard, inline tool tester
- Permission system for MCP tool safety (allow/deny/confirm, destructive tool gating)
Advanced
- Large‑file handling with smart chunking and progressive summarization
- File extraction (CSV schema, JSON schema, logs, code structure)
- Auto model selection for extraction tasks
- RAG pipeline with embeddings + vector DB (ChromaDB)
- Semantic search and citation tracking
- Rich/Typer CLI with interactive mode
- Svelte 5 SPA with tabbed interface, real-time streaming, and file attachments
- Web UI Features: Markdown rendering, syntax highlighting, cost tracking, model catalog browser
Operations & Observability
GET /api/healthfor basic API livenessGET /health/providersandGET /api/health/providersfor provider readiness snapshotsGET /api/metricsfor structured JSON metrics exportX-Correlation-IDsupport for HTTP tracing andcorrelation_idin WebSocket payloads- Streaming telemetry including first-token and total latency in final WebSocket usage payloads
Installation
git clone https://github.com/Bytes0211/stratifyai.git
cd stratifyai
pip install -e .
Or using uv:
uv sync
ℹ️ Python dependencies such as
mcp,fastapi, andpydanticare declared in the package metadata and are installed automatically bypipwhen installing from PyPI.⚠️ Optional MCP prerequisite: the published wheel/sdist does not bundle local Claude/Cursor/VS Code MCP config files. If you plan to use curated MCP servers that launch via
npx, install Node.js 18+ and make surenpxis available on yourPATH.
Configuration
cp .env.example .env
# Add your API keys
Check configured providers:
stratifyai check-keys
Security Runbook
For Phase 15 hardening guidance (threat model assumptions, staging/production environment settings, verification commands, and incident response), see:
docs/runbook/phase15-security-runbook.md
MCP Ecosystem
StratifyAI includes a complete MCP (Model Context Protocol) implementation:
- MCP Server: Exposes StratifyAI capabilities (chat, routing, cost tracking) as MCP tools
- MCP Client Engine: Spawns and manages external MCP servers, aggregates tools into chat
- MCP Abstraction Layer: Curated catalog of 20 MCP servers, CLI setup wizard, config generation
- Permission System: Safety defaults, destructive tool confirmation, per-server toggles
Documentation:
docs/MCP-QUICKSTART.md— Install, configure, first tool calldocs/MCP-TOOLS-REFERENCE.md— All tools, resources, and promptsdocs/MCP-CLIENT-CONFIG.md— Client config for Claude Desktop, Claude Code, Cursor, VS Code
Local MCP chat integration notes
- StratifyAI can auto-discover enabled MCP servers from Claude Desktop, Cursor, and VS Code configs for chat use.
- The Web UI now persists the active MCP chat selection and auto-enables newly discovered servers on first load, so MCP access survives refreshes after the shared-state unification work.
- The MCP dashboard and chat settings now support live refresh from disk and show the config source client for each discovered server.
- Reset config in the MCP tab can now clear selected or all applied MCP server entries, including the matching
stratifyai.mcpClientmetadata. - Anthropic-backed chats automatically receive provider-safe MCP tool aliases, so namespaced tools such as
postgresql.queryremain callable without hitting Anthropic's tool-name regex limits. - The PostgreSQL MCP
querytool is now treated as read-only by default, so chat sessions can execute safe SQL lookups without extra confirmation prompts. - If a tool appears in the UI but is never used, verify the server permission allow-list matches the actual tool names. Common examples:
- PostgreSQL MCP:
"allow": ["query"] - Brave MCP:
"allow": ["brave_*"]
- PostgreSQL MCP:
- If PostgreSQL shows connected but the model still says it is unavailable, inspect the returned
tool_resultsor server logs for a database auth error such aspassword authentication failed. In that case the MCP transport is healthy, but the configured connection string credentials still need to be corrected.
Add a custom MCP server
If the server you want is not in the curated catalog (for example, an Excel connector), you can add it directly:
uv run stratifyai mcp add-custom excel \
--client claude-desktop \
--command npx \
--command-arg -y \
--command-arg your-excel-mcp-package
You can also pass --env KEY=VALUE and extra --command-arg ... values for custom servers, then refresh the MCP dashboard or restart the client.
Quick Start
CLI Usage
stratifyai chat -p openai -m gpt-4o-mini -t "Hello"
stratifyai route "Explain relativity" --strategy hybrid
stratifyai interactive
stratifyai cache-stats
Python Example (LLMClient)
from stratifyai import LLMClient
from stratifyai.models import Message, ChatRequest, ChatResponse
client: LLMClient = LLMClient()
request: ChatRequest = ChatRequest(
model="gpt-4o-mini",
messages=[Message(role="user", content="Explain quantum computing")]
)
# Async (recommended)
response: ChatResponse = await client.chat_completion(request)
# Sync wrapper for scripts/CLI
response: ChatResponse = client.chat_completion_sync(request)
print(response.content)
print(f"Cost: ${response.usage.cost_usd:.6f}")
print(f"Latency: {response.latency_ms:.0f}ms")
Python Example (Chat Package - Simplified)
from stratifyai.chat import anthropic, openai
from stratifyai.models import ChatResponse
# Quick usage - model is always required
response: ChatResponse = await anthropic.chat("Hello!", model="claude-sonnet-4-5")
print(response.content)
# With options
response: ChatResponse = await openai.chat(
"Explain quantum computing",
model="gpt-4o-mini",
system="Be concise",
temperature=0.5
)
Builder Pattern (Fluent Configuration)
from stratifyai.chat import anthropic
from stratifyai.chat.builder import ChatBuilder
from stratifyai.models import ChatResponse
# Configure once, use multiple times
client: ChatBuilder = (
anthropic
.with_model("claude-sonnet-4-5")
.with_system("You are a helpful assistant")
.with_temperature(0.7)
)
# All subsequent calls use the configured settings
response: ChatResponse = await client.chat("Hello!")
response: ChatResponse = await client.chat("Tell me more")
# Stream with builder
async for chunk in client.chat_stream("Write a story"):
print(chunk.content, end="", flush=True)
Prompt Templates
from stratifyai.chat import anthropic
from stratifyai.prompts import registry
# Use a built-in template
response = await (
anthropic
.with_model("claude-sonnet-4-20250514")
.with_template("code_review", code=source_code, language="python", focus="security")
.chat("Review this code")
)
# CLI usage
# stratifyai templates # List all templates
# stratifyai chat --template summarize --params "style=bullet_points" --file document.txt
Web UI
StratifyAI includes a production-ready Svelte 5 SPA with modern UI/UX.
For the dedicated Web UI quick start and walkthrough, see
docs/UI-OVERVIEW.md.
Features
- Tabbed Interface: Config, Files, History, Cost tracking
- Real-time Streaming: WebSocket-based streaming chat with live token display
- File Attachments: Text files and images (for vision models)
- Smart Chunking: Configurable chunking (10k-100k chars) for large files
- Model Catalog Browser: Browse all models with filtering and capability badges
- Markdown Rendering: Syntax highlighting with highlight.js (190+ languages)
- Cost Tracking: Real-time cost analytics per message and session
- Theme Toggle: Dark/light themes with localStorage persistence
- Model Validation: Real-time API key validation and model availability
- MCP Tools in Chat: Enable discovered local MCP servers per conversation with live refresh and status visibility
Quick Start
# Install frontend dependencies
cd frontend
npm install
# Build the SPA
npm run build
# Start the API server (serves the built SPA)
cd ..
uvicorn api.main:app --reload --port 8080
# Open browser
open http://localhost:8080
Development Mode
# Terminal 1: Start backend
uvicorn api.main:app --reload --port 8080
# Terminal 2: Start frontend dev server
cd frontend
npm run dev
Observability Endpoints
# Basic health
curl http://localhost:8080/api/health
# Provider readiness
curl http://localhost:8080/health/providers
# Structured metrics (requires auth if STRATIFYAI_API_KEY is set)
curl -H "Authorization: Bearer $STRATIFYAI_API_KEY" http://localhost:8080/api/metrics
If you want request tracing in logs, send a correlation header:
curl -H "X-Correlation-ID: demo-trace-123" http://localhost:8080/api/health
Routing
- Cost: choose cheapest model
- Quality: choose highest‑quality model
- Latency: choose fastest model
- Hybrid (default): dynamic weighting based on complexity
RAG
- Embeddings (OpenAI)
- ChromaDB vector storage
- Semantic search
- Document indexing
- Retrieval‑augmented generation
- Citation tracking
Project Structure
stratifyai/
├── catalog/ # Model catalog (community-editable)
│ ├── models.json # Provider model metadata
│ ├── schema.json # JSON schema
│ └── README.md # Contribution guidelines
├── frontend/ # Svelte 5 SPA (48 files)
│ ├── src/ # SPA source code
│ │ ├── App.svelte # Main app component
│ │ ├── lib/ # Components, stores, API clients
│ │ └── styles/ # SCSS styling
│ ├── package.json # Frontend dependencies
│ └── vite.config.ts # Vite build configuration
├── api/ # FastAPI REST API + WebSocket
│ ├── main.py # API endpoints, streaming
│ └── static/ # Served assets
│ ├── dist/ # Built SPA (from frontend/)
│ └── index.html # Legacy fallback
├── stratifyai/ # Core package
│ ├── catalog_manager.py # Loads models from catalog/
│ ├── providers/ # Provider implementations (9 providers)
│ ├── router.py # Intelligent routing
│ ├── models.py # Data models
│ ├── chat/ # Simplified chat modules with builder
│ ├── mcp_server/ # MCP server (8 tools, 5 resources, 13+ prompts)
│ ├── mcp_client/ # MCP client engine (spawn/manage external servers)
│ ├── mcp_catalog/ # MCP server catalog (20 curated servers)
│ ├── prompts/ # Prompt template system (10 built-in)
│ ├── profiles/ # Configuration profiles
│ └── utils/ # Utilities (token counting, extraction)
├── cli/ # Typer CLI
├── examples/ # Usage examples
├── scripts/ # Validation and maintenance tools
└── docs/ # Technical documentation
Model Catalog
StratifyAI uses a community-editable JSON catalog (catalog/models.json) as the source of truth for provider model metadata. This enables:
- Easy Updates: Submit PRs to add/update/deprecate models
- Automated Validation: CI validates all changes via JSON schema
- Deprecation Tracking: Built-in lifecycle management
- Dated Model IDs: All models use dated IDs (e.g.,
claude-3-haiku-20240307) for reproducibility
Contributing:
To update the catalog (add new models, mark deprecations, update pricing):
- Edit
catalog/models.json - Validate:
python scripts/validate_catalog.py - Submit PR (CI automatically validates)
See docs/CATALOG_MANAGEMENT.md for full contribution guidelines.
Testing
pytest # Run all tests
pytest -v # Verbose output
Test Coverage: 877 tests across all modules (85% code coverage)
License
Internal project — All rights reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stratifyai-2.0.3.tar.gz.
File metadata
- Download URL: stratifyai-2.0.3.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5a6de56b4f7a4fb80ad1491d53a28831cba0784bdfd7f349fd38ce89bfe080c
|
|
| MD5 |
48f0fe41afbaeb5c29550589ff52cc81
|
|
| BLAKE2b-256 |
45a582a706242849ed4b19c95c5a4575ac9021722dbb374f453d96c062de49bc
|
File details
Details for the file stratifyai-2.0.3-py3-none-any.whl.
File metadata
- Download URL: stratifyai-2.0.3-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53c86249b95ebceefeef24a637256e459378a6f81f596465ba49488109db8d9c
|
|
| MD5 |
80b7b0e096db5c7b6b7260dff44e13e4
|
|
| BLAKE2b-256 |
8b7c2c8ed293e6bdeeb916edc31dead7a75bcadb028980c5b9278dd5559ec81c
|