Skip to main content

Unified multi-provider LLM abstraction module with intelligent routing, cost tracking, and caching

Project description

StratifyAI

StratifyAI — Unified Multi‑Provider LLM Interface v2.0.6

Python License Tests Providers

Status: Production Ready — MCP Ecosystem Complete (Server + Client Engine + Abstraction Layer) Providers: 9 Operational Features: Routing • RAG • Caching • Streaming • Observability • Security Hardening • CLI • Svelte 5 SPA • Vision • Smart Chunking • Prompt Templates • O(1) CacheConcurrency LimitsMCP Server & Client Engine

StratifyAI is a production‑ready Python framework that provides a unified interface for 9+ LLM providers, including OpenAI, Anthropic, Google, DeepSeek, Groq, Grok, OpenRouter, Ollama, and AWS Bedrock. It eliminates vendor lock‑in, simplifies multi‑model development, and enables intelligent routing, cost tracking, caching, streaming, and RAG workflows.

Start here: docs/GETTING-STARTED.mdWeb UI guide: docs/UI-OVERVIEW.mdExamples: examples/README.mdVision guide: docs/VISION-SUPPORT.md


Features

Core

  • Unified API for 9+ LLM providers
  • Async-first architecture with sync wrappers
  • Automatic provider detection
  • Cost tracking and budget enforcement
  • Latency tracking on all responses
  • Retry logic with fallback models
  • Streaming support for all providers
  • O(1) Cache with LRU eviction + provider prompt caching
  • Concurrent read-write locking (RWLockFair) for high-throughput caching
  • Provider concurrency limits (max concurrent requests per provider)
  • Correlation IDs for HTTP/WebSocket tracing
  • Provider health and structured metrics endpoints
  • Intelligent routing (cost, quality, latency, hybrid)
  • Capability filtering (vision, tools, reasoning)
  • Model metadata and context window awareness
  • Builder pattern for fluent configuration
  • Vision support for image analysis (GPT-4o, Claude, Gemini, Nova)
  • Prompt templates with 10 built-in templates (code review, summarization, translation, etc.)
  • User-defined template support via ~/.stratifyai/prompts/
  • MCP Server with 8 tools, 5 resources, 13+ prompts
  • MCP Client Engine — spawn and manage external MCP servers, tool aggregation, chat integration
  • MCP Abstraction Layer — curated server catalog, CLI wizard, inline tool tester
  • Permission system for MCP tool safety (allow/deny/confirm, destructive tool gating)

Advanced

  • Large‑file handling with smart chunking and progressive summarization
  • File extraction (CSV schema, JSON schema, logs, code structure)
  • Auto model selection for extraction tasks
  • RAG pipeline with embeddings + vector DB (ChromaDB)
  • Semantic search and citation tracking
  • Rich/Typer CLI with interactive mode
  • Svelte 5 SPA with tabbed interface, real-time streaming, and file attachments
  • Web UI Features: Markdown rendering, syntax highlighting, cost tracking, model catalog browser

Operations & Observability

  • GET /api/health for basic API liveness
  • GET /health/providers and GET /api/health/providers for provider readiness snapshots
  • GET /api/metrics for structured JSON metrics export
  • X-Correlation-ID support for HTTP tracing and correlation_id in WebSocket payloads
  • Streaming telemetry including first-token and total latency in final WebSocket usage payloads

Installation

git clone https://github.com/Bytes0211/stratifyai.git
cd stratifyai
pip install -e .

Or using uv:

uv sync

ℹ️ Python dependencies such as mcp, fastapi, and pydantic are declared in the package metadata and are installed automatically by pip when installing from PyPI.

⚠️ Optional MCP prerequisite: the published wheel/sdist does not bundle local Claude/Cursor/VS Code MCP config files. If you plan to use curated MCP servers that launch via npx, install Node.js 18+ and make sure npx is available on your PATH.


Configuration

cp .env.example .env
# Add your API keys

Check configured providers:

stratifyai check-keys

Security Runbook

For Phase 15 hardening guidance (threat model assumptions, staging/production environment settings, verification commands, and incident response), see:

  • docs/runbook/phase15-security-runbook.md

MCP Ecosystem

StratifyAI includes a complete MCP (Model Context Protocol) implementation:

  • MCP Server: Exposes StratifyAI capabilities (chat, routing, cost tracking) as MCP tools
  • MCP Client Engine: Spawns and manages external MCP servers, aggregates tools into chat
  • MCP Abstraction Layer: Curated catalog of 20 MCP servers, CLI setup wizard, config generation
  • Permission System: Safety defaults, destructive tool confirmation, per-server toggles

Documentation:

  • docs/MCP-QUICKSTART.md — Install, configure, first tool call
  • docs/MCP-TOOLS-REFERENCE.md — All tools, resources, and prompts
  • docs/MCP-CLIENT-CONFIG.md — Client config for Claude Desktop, Claude Code, Cursor, VS Code

Local MCP chat integration notes

  • StratifyAI can auto-discover enabled MCP servers from Claude Desktop, Cursor, and VS Code configs for chat use.
  • The Web UI now persists the active MCP chat selection and auto-enables newly discovered servers on first load, so MCP access survives refreshes after the shared-state unification work.
  • The MCP dashboard and chat settings now support live refresh from disk and show the config source client for each discovered server.
  • Reset config in the MCP tab can now clear selected or all applied MCP server entries, including the matching stratifyai.mcpClient metadata.
  • Anthropic-backed chats automatically receive provider-safe MCP tool aliases, so namespaced tools such as postgresql.query remain callable without hitting Anthropic's tool-name regex limits.
  • The PostgreSQL MCP query tool is now treated as read-only by default, so chat sessions can execute safe SQL lookups without extra confirmation prompts.
  • If a tool appears in the UI but is never used, verify the server permission allow-list matches the actual tool names. Common examples:
    • PostgreSQL MCP: "allow": ["query"]
    • Brave MCP: "allow": ["brave_*"]
  • If PostgreSQL shows connected but the model still says it is unavailable, inspect the returned tool_results or server logs for a database auth error such as password authentication failed. In that case the MCP transport is healthy, but the configured connection string credentials still need to be corrected.

Add a custom MCP server

If the server you want is not in the curated catalog (for example, an Excel connector), you can add it directly:

uv run stratifyai mcp add-custom excel \
  --client claude-desktop \
  --command npx \
  --command-arg -y \
  --command-arg your-excel-mcp-package

You can also pass --env KEY=VALUE and extra --command-arg ... values for custom servers, then refresh the MCP dashboard or restart the client.


Quick Start

CLI Usage

stratifyai chat -p openai -m gpt-4o-mini -t "Hello"
stratifyai route "Explain relativity" --strategy hybrid
stratifyai interactive
stratifyai cache-stats

Python Example (LLMClient)

from stratifyai import LLMClient
from stratifyai.models import Message, ChatRequest, ChatResponse

client: LLMClient = LLMClient()
request: ChatRequest = ChatRequest(
    model="gpt-4o-mini",
    messages=[Message(role="user", content="Explain quantum computing")]
)

# Async (recommended)
response: ChatResponse = await client.chat_completion(request)

# Sync wrapper for scripts/CLI
response: ChatResponse = client.chat_completion_sync(request)

print(response.content)
print(f"Cost: ${response.usage.cost_usd:.6f}")
print(f"Latency: {response.latency_ms:.0f}ms")

Python Example (Chat Package - Simplified)

from stratifyai.chat import anthropic, openai
from stratifyai.models import ChatResponse

# Quick usage - model is always required
response: ChatResponse = await anthropic.chat("Hello!", model="claude-sonnet-4-5")
print(response.content)

# With options
response: ChatResponse = await openai.chat(
    "Explain quantum computing",
    model="gpt-4o-mini",
    system="Be concise",
    temperature=0.5
)

Builder Pattern (Fluent Configuration)

from stratifyai.chat import anthropic
from stratifyai.chat.builder import ChatBuilder
from stratifyai.models import ChatResponse

# Configure once, use multiple times
client: ChatBuilder = (
    anthropic
    .with_model("claude-sonnet-4-5")
    .with_system("You are a helpful assistant")
    .with_temperature(0.7)
)

# All subsequent calls use the configured settings
response: ChatResponse = await client.chat("Hello!")
response: ChatResponse = await client.chat("Tell me more")

# Stream with builder
async for chunk in client.chat_stream("Write a story"):
    print(chunk.content, end="", flush=True)

Prompt Templates

from stratifyai.chat import anthropic
from stratifyai.prompts import registry

# Use a built-in template
response = await (
    anthropic
    .with_model("claude-sonnet-4-20250514")
    .with_template("code_review", code=source_code, language="python", focus="security")
    .chat("Review this code")
)

# CLI usage
# stratifyai templates  # List all templates
# stratifyai chat --template summarize --params "style=bullet_points" --file document.txt

Web UI

StratifyAI includes a production-ready Svelte 5 SPA with modern UI/UX.

For the dedicated Web UI quick start and walkthrough, see docs/UI-OVERVIEW.md.

Features

  • Tabbed Interface: Config, Files, History, Cost tracking
  • Real-time Streaming: WebSocket-based streaming chat with live token display
  • File Attachments: Text files and images (for vision models)
  • Smart Chunking: Configurable chunking (10k-100k chars) for large files
  • Model Catalog Browser: Browse all models with filtering and capability badges
  • Markdown Rendering: Syntax highlighting with highlight.js (190+ languages)
  • Cost Tracking: Real-time cost analytics per message and session
  • Theme Toggle: Dark/light themes with localStorage persistence
  • Model Validation: Real-time API key validation and model availability
  • MCP Tools in Chat: Enable discovered local MCP servers per conversation with live refresh and status visibility

Quick Start

# Install frontend dependencies
cd frontend
npm install

# Build the SPA
npm run build

# Start the API server (serves the built SPA)
cd ..
python -m uvicorn api.main:app --host 127.0.0.1 --port 8080

# Or, if you use uv
uv run python -m uvicorn api.main:app --host 127.0.0.1 --port 8080

Open:

http://127.0.0.1:8080

Development Mode

# Terminal 1: Start backend
uvicorn api.main:app --reload --port 8080

# Terminal 2: Start frontend dev server
cd frontend
npm run dev

Observability Endpoints

# Basic health
curl http://localhost:8080/api/health

# Provider readiness
curl http://localhost:8080/health/providers

# Structured metrics (requires auth if STRATIFYAI_API_KEY is set)
curl -H "Authorization: Bearer $STRATIFYAI_API_KEY" http://localhost:8080/api/metrics

If you want request tracing in logs, send a correlation header:

curl -H "X-Correlation-ID: demo-trace-123" http://localhost:8080/api/health

Routing

  • Cost: choose cheapest model
  • Quality: choose highest‑quality model
  • Latency: choose fastest model
  • Hybrid (default): dynamic weighting based on complexity

RAG

  • Embeddings (OpenAI)
  • ChromaDB vector storage
  • Semantic search
  • Document indexing
  • Retrieval‑augmented generation
  • Citation tracking

Project Structure

stratifyai/
├── catalog/              # Model catalog (community-editable)
│   ├── models.json       # Provider model metadata
│   ├── schema.json       # JSON schema
│   └── README.md         # Contribution guidelines
├── frontend/             # Svelte 5 SPA (48 files)
│   ├── src/              # SPA source code
│   │   ├── App.svelte    # Main app component
│   │   ├── lib/          # Components, stores, API clients
│   │   └── styles/       # SCSS styling
│   ├── package.json      # Frontend dependencies
│   └── vite.config.ts    # Vite build configuration
├── api/                  # FastAPI REST API + WebSocket
│   ├── main.py           # API endpoints, streaming
│   └── static/           # Served assets
│       ├── dist/         # Built SPA (from frontend/)
│       └── index.html    # Legacy fallback
├── stratifyai/           # Core package
│   ├── catalog_manager.py # Loads models from catalog/
│   ├── providers/        # Provider implementations (9 providers)
│   ├── router.py         # Intelligent routing
│   ├── models.py         # Data models
│   ├── chat/             # Simplified chat modules with builder
│   ├── mcp_server/       # MCP server (8 tools, 5 resources, 13+ prompts)
│   ├── mcp_client/       # MCP client engine (spawn/manage external servers)
│   ├── mcp_catalog/      # MCP server catalog (20 curated servers)
│   ├── prompts/          # Prompt template system (10 built-in)
│   ├── profiles/         # Configuration profiles
│   └── utils/            # Utilities (token counting, extraction)
├── cli/                  # Typer CLI
├── examples/             # Usage examples
├── scripts/              # Validation and maintenance tools
└── docs/                 # Technical documentation

Model Catalog

StratifyAI uses a community-editable JSON catalog (catalog/models.json) as the source of truth for provider model metadata. This enables:

  • Easy Updates: Submit PRs to add/update/deprecate models
  • Automated Validation: CI validates all changes via JSON schema
  • Deprecation Tracking: Built-in lifecycle management
  • Dated Model IDs: All models use dated IDs (e.g., claude-3-haiku-20240307) for reproducibility

Contributing:

To update the catalog (add new models, mark deprecations, update pricing):

  1. Edit catalog/models.json
  2. Validate: python scripts/validate_catalog.py
  3. Submit PR (CI automatically validates)

See docs/CATALOG_MANAGEMENT.md for full contribution guidelines.


Testing

pytest           # Run all tests
pytest -v        # Verbose output

Test Coverage: 877 tests across all modules (85% code coverage)


License

Internal project — All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stratifyai-2.0.6.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stratifyai-2.0.6-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file stratifyai-2.0.6.tar.gz.

File metadata

  • Download URL: stratifyai-2.0.6.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for stratifyai-2.0.6.tar.gz
Algorithm Hash digest
SHA256 b2439d7143c9e81c6bddb9efc08d8ab53734634d98589b9aa7600b2319e5087f
MD5 67740b3b1c38b31b64f29c589fec66f1
BLAKE2b-256 88589d1d55a6194f1cf349bc4e0341883da0836d1a4140e9a47a8aad4ff30efa

See more details on using hashes here.

File details

Details for the file stratifyai-2.0.6-py3-none-any.whl.

File metadata

  • Download URL: stratifyai-2.0.6-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for stratifyai-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1eea53394289997843e07fdb032dc3d514acb3609ef98a30c9af0b0b74cc092a
MD5 95c7a468225d6bfe675f5d48f18dfbec
BLAKE2b-256 57be5423a585d4cfd25690d6230ec0d1687b5402988382e330bd4bfd0484dbb4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page