Unified multi-provider LLM abstraction module with intelligent routing, cost tracking, and caching
Project description
StratifyAI — Unified Multi‑Provider LLM Interface
Status: Phase 7.8 Complete
Providers: 9 Operational
Features: Routing • RAG • Caching • Streaming • CLI • Web UI • Builder Pattern
StratifyAI is a production‑ready Python framework that provides a unified interface for 9+ LLM providers, including OpenAI, Anthropic, Google, DeepSeek, Groq, Grok, OpenRouter, Ollama, and AWS Bedrock. It eliminates vendor lock‑in, simplifies multi‑model development, and enables intelligent routing, cost tracking, caching, streaming, and RAG workflows.
Features
Core
- Unified API for 9+ LLM providers
- Async-first architecture with sync wrappers
- Automatic provider detection
- Cost tracking and budget enforcement
- Latency tracking on all responses
- Retry logic with fallback models
- Streaming support for all providers
- Response caching + provider prompt caching
- Intelligent routing (cost, quality, latency, hybrid)
- Capability filtering (vision, tools, reasoning)
- Model metadata and context window awareness
- Builder pattern for fluent configuration
Advanced
- Large‑file handling with chunking and progressive summarization
- File extraction (CSV schema, JSON schema, logs, code structure)
- Auto model selection for extraction tasks
- RAG pipeline with embeddings + vector DB (ChromaDB)
- Semantic search and citation tracking
- Rich/Typer CLI with interactive mode
- Optional FastAPI web interface
Installation
git clone https://github.com/Bytes0211/stratifyai.git
cd stratifyai
pip install -e .
Or using uv:
uv sync
Configuration
cp .env.example .env
# Add your API keys
Check configured providers:
stratifyai check-keys
Quick Start
CLI Usage
stratifyai chat -p openai -m gpt-4o-mini -t "Hello"
stratifyai route "Explain relativity" --strategy hybrid
stratifyai interactive
stratifyai cache-stats
Python Example (LLMClient)
from stratifyai import LLMClient
from stratifyai.models import Message, ChatRequest, ChatResponse
client: LLMClient = LLMClient()
request: ChatRequest = ChatRequest(
model="gpt-4o-mini",
messages=[Message(role="user", content="Explain quantum computing")]
)
# Async (recommended)
response: ChatResponse = await client.chat_completion(request)
# Sync wrapper for scripts/CLI
response: ChatResponse = client.chat_completion_sync(request)
print(response.content)
print(f"Cost: ${response.usage.cost_usd:.6f}")
print(f"Latency: {response.latency_ms:.0f}ms")
Python Example (Chat Package - Simplified)
from stratifyai.chat import anthropic, openai
from stratifyai.models import ChatResponse
# Quick usage - model is always required
response: ChatResponse = await anthropic.chat("Hello!", model="claude-sonnet-4-5")
print(response.content)
# With options
response: ChatResponse = await openai.chat(
"Explain quantum computing",
model="gpt-4o-mini",
system="Be concise",
temperature=0.5
)
Builder Pattern (Fluent Configuration)
from stratifyai.chat import anthropic
from stratifyai.chat.builder import ChatBuilder
from stratifyai.models import ChatResponse
# Configure once, use multiple times
client: ChatBuilder = (
anthropic
.with_model("claude-sonnet-4-5")
.with_system("You are a helpful assistant")
.with_temperature(0.7)
)
# All subsequent calls use the configured settings
response: ChatResponse = await client.chat("Hello!")
response: ChatResponse = await client.chat("Tell me more")
# Stream with builder
async for chunk in client.chat_stream("Write a story"):
print(chunk.content, end="", flush=True)
Routing
- Cost: choose cheapest model
- Quality: choose highest‑quality model
- Latency: choose fastest model
- Hybrid (default): dynamic weighting based on complexity
RAG
- Embeddings (OpenAI)
- ChromaDB vector storage
- Semantic search
- Document indexing
- Retrieval‑augmented generation
- Citation tracking
Project Structure
stratifyai/
├── llm_abstraction/ # Core package
│ ├── providers/ # Provider implementations (9 providers)
│ ├── router.py # Intelligent routing
│ ├── models.py # Data models
│ └── utils/ # Utilities (token counting, extraction)
├── chat/ # Simplified chat modules with builder pattern
│ ├── builder.py # ChatBuilder class
│ └── stratifyai_*.py # Provider-specific modules
├── cli/ # Typer CLI
├── api/ # Optional FastAPI server
├── examples/ # Usage examples
└── docs/ # Technical documentation
Testing
pytest # Run all tests
pytest -v # Verbose output
Test Coverage: 300+ tests across all modules
License
Internal project — All rights reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stratifyai-0.1.1.tar.gz.
File metadata
- Download URL: stratifyai-0.1.1.tar.gz
- Upload date:
- Size: 213.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f1ed1c08f7fbbed363ff77b761a3e79280f53d72ee923101c4ae520758c9213
|
|
| MD5 |
b71f29cc3145c8d61cd5b2ddea129116
|
|
| BLAKE2b-256 |
3742088daba6e0d7e72a974567a921495d2822be850c6ab117f5f9f313a83659
|
File details
Details for the file stratifyai-0.1.1-py3-none-any.whl.
File metadata
- Download URL: stratifyai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 120.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fde769f8b962456ff8fb1ef975417669df36a9a2e4f31ffe8f9beef102a01fe6
|
|
| MD5 |
c43eacc179d7e6697b515baa4064a14b
|
|
| BLAKE2b-256 |
5f2b7b72d5a3654db68e6165bbe01c306561c41e466a41db9c4f54d37ca73301
|