Skip to main content

Smart AI model cascading for cost optimization - Save 40-85% on LLM costs with 2-6x faster responses. Available for Python and TypeScript/JavaScript.

Project description

CascadeFlow Logo

Smart AI model cascading for cost optimization

PyPI version npm version n8n version Python Version License: MIT Downloads GitHub Stars Tests

Python Python โ€ข TypeScript TypeScript โ€ข n8n n8n โ€ข ๐Ÿ“– Docs โ€ข ๐Ÿ’ก Examples


Stop Bleeding Money on AI Calls. Cut Costs 30-65% in 3 Lines of Code.

40-70% of text prompts and 20-60% of agent calls don't need expensive flagship models. You're overpaying every single day.

Cascadeflow fixes this with intelligent model cascading, available in Python and TypeScript.

pip install cascadeflow
npm install @cascadeflow/core

Why Cascadeflow?

Cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, Cascadeflow automatically escalates to flagship models if needed.

Use Cases

Use Cascadeflow for:

  • Cost Optimization. Reduce API costs by 40-85% through intelligent model cascading and speculative execution with automatic per-query cost tracking.
  • Cost Control and Transparency. Built-in telemetry for query, model, and provider-level cost tracking with configurable budget limits and programmable spending caps.
  • Low Latency & Speed Optimization. Sub-2ms framework overhead with fast provider routing (Groq sub-50ms). Cascade simple queries to fast models while reserving expensive models for complex reasoning, achieving 2-10x latency reduction overall. (use preset PRESET_ULTRA_FAST)
  • Multi-Provider Flexibility. Unified API across OpenAI, Anthropic, Groq, Ollama, vLLM, Together, and Hugging Face with automatic provider detection and zero vendor lock-in. Optional LiteLLM integration for 100+ additional providers.
  • Edge & Local-Hosted AI Deployment. Use best of both worlds: handle most queries with local models (vLLM, Ollama), then automatically escalate complex queries to cloud providers only when needed.

โ„น๏ธ Note: SLMs (under 10B parameters) are sufficiently powerful for 60-70% of agentic AI tasks. Research paper


How Cascadeflow Works

Cascadeflow uses speculative execution with quality validation:

  1. Speculatively executes small, fast models first - optimistic execution ($0.15-0.30/1M tokens)
  2. Validates quality of responses using configurable thresholds (completeness, confidence, correctness)
  3. Dynamically escalates to larger models only when quality validation fails ($1.25-3.00/1M tokens)
  4. Learns patterns to optimize future cascading decisions and domain specific routing

Zero configuration. Works with YOUR existing models (7 Providers currently supported).

In practice, 60-70% of queries are handled by small, efficient models (8-20x cost difference) without requiring escalation

Result: 40-85% cost reduction, 2-10x faster responses, zero quality loss.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Cascadeflow Stack                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Cascade Agent                                        โ”‚  โ”‚
โ”‚  โ”‚                                                       โ”‚  โ”‚
โ”‚  โ”‚  Orchestrates the entire cascade execution            โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Query routing & model selection                    โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Drafter -> Verifier coordination                   โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Cost tracking & telemetry                          โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ†“                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Domain Pipeline                                      โ”‚  โ”‚
โ”‚  โ”‚                                                       โ”‚  โ”‚
โ”‚  โ”‚  Automatic domain classification                      โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Rule-based detection (CODE, MATH, DATA, etc.)      โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Optional ML semantic classification                โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Domain-optimized pipelines & model selection       โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ†“                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Quality Validation Engine                            โ”‚  โ”‚
โ”‚  โ”‚                                                       โ”‚  โ”‚
โ”‚  โ”‚  Multi-dimensional quality checks                     โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Length validation (too short/verbose)              โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Confidence scoring (logprobs analysis)             โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Format validation (JSON, structured output)        โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Semantic alignment (intent matching)               โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ†“                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Cascading Engine (<2ms overhead)                     โ”‚  โ”‚
โ”‚  โ”‚                                                       โ”‚  โ”‚
โ”‚  โ”‚  Smart model escalation strategy                      โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Try cheap models first (speculative execution)     โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Validate quality instantly                         โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Escalate only when needed                          โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Automatic retry & fallback                         โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ†“                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Provider Abstraction Layer                           โ”‚  โ”‚
โ”‚  โ”‚                                                       โ”‚  โ”‚
โ”‚  โ”‚  Unified interface for 7+ providers                   โ”‚  โ”‚
โ”‚  โ”‚  โ€ข OpenAI โ€ข Anthropic โ€ข Groq โ€ข Ollama                 โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Together โ€ข vLLM โ€ข HuggingFace โ€ข LiteLLM            โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Quick Start

Python Python

pip install cascadeflow[all]
from cascadeflow import CascadeAgent, ModelConfig

# Define your cascade - try cheap model first, escalate if needed
agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),  # Try first
    ModelConfig(name="gpt-5", provider="openai", cost=0.00125),        # Fallback
])

# Run query - automatically routes to optimal model
result = await agent.run("What's the capital of France?")

print(f"Answer: {result.content}")
print(f"Model used: {result.model_used}")
print(f"Cost: ${result.total_cost:.6f}")
๐Ÿ’ก Optional: Enable ML-based Quality Engine & Domain Detection for Higher Accuracy

Step 1: Install the optional ML package:

pip install cascadeflow[ml]  # Adds semantic similarity detection via FastEmbed

Step 2: Enable semantic detection in your agent:

from cascadeflow import CascadeAgent, ModelConfig

# Enable ML-based semantic detection (optional parameter)
agent = CascadeAgent(
    models=[
        ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),
        ModelConfig(name="gpt-5", provider="openai", cost=0.00125),
    ],
    enable_semantic_detection=True  # Optional: Uses ML for domain detection
)

# ML semantic detection is now active for all queries
result = await agent.run("Calculate the eigenvalues of matrix [[1,2],[3,4]]")

# Check which detection method was used
print(f"Domain: {result.metadata.get('domain_detected')}")
print(f"Method: {result.metadata.get('detection_method')}")  # 'semantic' or 'rule-based'
print(f"Confidence: {result.metadata.get('domain_confidence', 0):.1%}")

What you get:

  • ๐ŸŽฏ 84-87% confidence on complex domains (MATH, CODE, DATA, STRUCTURED)
  • ๐Ÿ”„ Automatic fallback to rule-based if ML dependencies unavailable
  • ๐Ÿ“ˆ Improved routing accuracy for specialized queries
  • ๐Ÿš€ Works seamlessly with your existing cascade setup

Note: If enable_semantic_detection=True but FastEmbed is not installed, CascadeFlow automatically falls back to rule-based detection without errors.

โš ๏ธ GPT-5 Note: GPT-5 requires OpenAI organization verification. Go to OpenAI Settings and click "Verify Organization". Access is granted within ~15 minutes. Alternatively, use the recommended setup below which works immediately.

๐Ÿ“– Learn more: Python Documentation | Quickstart Guide | Providers Guide

TypeScript TypeScript

npm install @cascadeflow/core
import { CascadeAgent, ModelConfig } from '@cascadeflow/core';

// Same API as Python!
const agent = new CascadeAgent({
  models: [
    { name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
    { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
  ],
});

const result = await agent.run('What is TypeScript?');
console.log(`Model: ${result.modelUsed}`);
console.log(`Cost: $${result.totalCost}`);
console.log(`Saved: ${result.savingsPercentage}%`);
๐Ÿ’ก Optional: Enable ML-based Quality Engine & Domain Detection for Higher Accuracy

Note: ML semantic detection is currently available in Python only. TypeScript support is planned for a future release. Rule-based detection provides excellent accuracy out of the box.

For Python users:

Step 1: Install the ML package:

pip install cascadeflow[ml]

Step 2: Enable semantic detection:

from cascadeflow import CascadeAgent, ModelConfig

agent = CascadeAgent(
    models=[...],
    enable_semantic_detection=True  # Enables ML-based detection
)

Future TypeScript Support (Planned):

// Will be available in a future release
npm install @cascadeflow/ml

import { CascadeAgent, ModelConfig } from '@cascadeflow/core';

// Step 1: Enable semantic detection in configuration
const agent = new CascadeAgent({
  models: [
    { name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
    { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
  ],
  enableSemanticDetection: true  // Optional: Uses ML for domain detection
});

// Step 2: Query with ML-enhanced detection
const result = await agent.run('Parse this JSON and validate the schema');

// Check which detection method was used
console.log(`Domain: ${result.metadata.domainDetected}`);
console.log(`Method: ${result.metadata.detectionMethod}`);  // 'semantic' or 'rule-based'
console.log(`Confidence: ${(result.metadata.domainConfidence * 100).toFixed(1)}%`);

What you'll get (when available):

  • ๐ŸŽฏ 84-87% confidence on complex domains (MATH, CODE, DATA, STRUCTURED)
  • ๐Ÿ”„ Automatic fallback to rule-based if ML unavailable
  • ๐Ÿ“ˆ Improved routing accuracy for specialized queries
  • ๐Ÿš€ Works seamlessly with your existing cascade setup

Currently, CascadeFlow TypeScript uses highly accurate rule-based domain detection which works great for most use cases!

๐Ÿ“– Learn more: TypeScript Documentation | Node.js Examples | Browser/Edge Guide

๐Ÿ”„ Migration Example

Migrate in 5min from direct Provider implementation to cost savings and full cost control and transparency.

Before (Standard Approach)

Cost: $0.001250, Latency: 850ms

# Using expensive model for everything
result = openai.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "What's 2+2?"}]
)

After (With CascadeFlow)

Cost: $0.000150, Latency: 234ms

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.00015),
    ModelConfig(name="gpt-5", provider="openai", cost=0.00125),
])

result = await agent.run("What's 2+2?")

๐Ÿ”ฅ Saved: $0.001100 (88% reduction), 3.6x faster

๐Ÿ“Š Learn more: Cost Tracking Guide | Production Best Practices | Performance Optimization


n8n n8n Integration

Use CascadeFlow in n8n workflows for no-code AI automation with automatic cost optimization!

Installation

  1. Open n8n
  2. Go to Settings โ†’ Community Nodes
  3. Search for: n8n-nodes-cascadeflow
  4. Click Install

Quick Example

Create a workflow:

Manual Trigger โ†’ CascadeFlow Node โ†’ Set Node

Configure CascadeFlow node:

  • Draft Model: gpt-4o-mini ($0.00015)
  • Verifier Model: gpt-4o ($0.00625)
  • Message: Your prompt
  • Output: Full Metrics

Result: 40-85% cost savings in your n8n workflows!

Features:

  • โœ… Visual workflow integration
  • โœ… Multi-provider support
  • โœ… Cost tracking in workflow
  • โœ… Tool calling support
  • โœ… Easy debugging with metrics

๐Ÿ”Œ Learn more: n8n Integration Guide | n8n Documentation


Resources

Examples

Python Python Examples:

Basic Examples - Get started quickly
Example Description Link
Basic Usage Simple cascade setup with OpenAI models View
Preset Usage Use built-in presets for quick setup View Guide
Multi-Provider Mix multiple AI providers in one cascade View
Reasoning Models ๐Ÿ†• Use reasoning models (o1/o3, Claude 3.7, DeepSeek-R1) View
Tool Execution Function calling and tool usage View
Streaming Text Stream responses from cascade agents View
Cost Tracking Track and analyze costs across queries View
Advanced Examples - Production & customization
Example Description Link
Production Patterns Best practices for production deployments View
FastAPI Integration Integrate cascades with FastAPI View
Streaming Tools Stream tool calls and responses View
Batch Processing Process multiple queries efficiently View
Multi-Step Cascade Build complex multi-step cascades View
Edge Device Run cascades on edge devices with local models View
vLLM Example Use vLLM for local model deployment View
Custom Cascade Build custom cascade strategies View
Custom Validation Implement custom quality validators View
User Budget Tracking Per-user budget enforcement and tracking View
User Profile Usage User-specific routing and configurations View
Rate Limiting Implement rate limiting for cascades View
Guardrails Add safety and content guardrails View
Cost Forecasting Forecast costs and detect anomalies View
Semantic Quality Detection ML-based domain and quality detection View
Profile Database Integration Integrate user profiles with databases View

TypeScript TypeScript Examples:

Basic Examples - Get started quickly
Example Description Link
Basic Usage Simple cascade setup (Node.js) View
Tool Calling Function calling with tools (Node.js) View
Multi-Provider Mix providers in TypeScript (Node.js) View
Reasoning Models ๐Ÿ†• Use reasoning models (o1/o3, Claude 3.7, DeepSeek-R1) View
Streaming Stream responses in TypeScript View
Advanced Examples - Production & edge deployment
Example Description Link
Production Patterns Production best practices (Node.js) View
Browser/Edge Vercel Edge runtime example View

๐Ÿ“‚ View All Python Examples โ†’ | View All TypeScript Examples โ†’

Documentation

Getting Started - Core concepts and basics
Guide Description Link
Quickstart Get started with CascadeFlow in 5 minutes Read
Providers Guide Configure and use different AI providers Read
Presets Guide Using and creating custom presets Read
Streaming Guide Stream responses from cascade agents Read
Tools Guide Function calling and tool usage Read
Cost Tracking Track and analyze API costs Read
Advanced Topics - Production, customization & integrations
Guide Description Link
Production Guide Best practices for production deployments Read
Performance Guide Optimize cascade performance and latency Read
Custom Cascade Build custom cascade strategies Read
Custom Validation Implement custom quality validators Read
Edge Device Deploy cascades on edge devices Read
Browser Cascading Run cascades in the browser/edge Read
FastAPI Integration Integrate with FastAPI applications Read
n8n Integration Use CascadeFlow in n8n workflows Read

๐Ÿ“š View All Documentation โ†’


Features

Feature Benefit
๐ŸŽฏ Speculative Cascading Tries cheap models first, escalates intelligently
๐Ÿ’ฐ 40-85% Cost Savings Research-backed, proven in production
โšก 2-10x Faster Small models respond in <50ms vs 500-2000ms
โšก Low Latency ๐Ÿ†• Sub-2ms framework overhead, negligible performance impact
๐Ÿ”„ Mix Any Providers ๐Ÿ†• OpenAI, Anthropic, Groq, Ollama, vLLM, Together + LiteLLM (optional)
๐Ÿ‘ค User Profile System ๐Ÿ†• Per-user budgets, tier-aware routing, enforcement callbacks
โœ… Quality Validation ๐Ÿ†• Automatic checks + semantic similarity (optional ML, ~80MB, CPU)
๐ŸŽจ Cascading Policies ๐Ÿ†• Domain-specific pipelines, multi-step validation strategies
๐Ÿง  Domain Understanding ๐Ÿ†• Auto-detects code/medical/legal/math/structured data, routes to specialists
๐Ÿค– Drafter/Validator Pattern 20-60% savings for agent/tool systems
๐Ÿ”ง Tool Calling Support ๐Ÿ†• Universal format, works across all providers
๐Ÿ“Š Cost Tracking ๐Ÿ†• Built-in analytics + OpenTelemetry export (vendor-neutral)
๐Ÿš€ 3-Line Integration Zero architecture changes needed
๐Ÿญ Production Ready ๐Ÿ†• Streaming, batch processing, tool handling, reasoning model support, caching, error recovery, anomaly detection

License

MIT ยฉ see LICENSE file.

Free for commercial use. Attribution appreciated but not required.


Contributing

We โค๏ธ contributions!

๐Ÿ“ Contributing Guide - Python & TypeScript development setup


Roadmap

  • Cascade Profiler - Analyzes your AI API logs to calculate cost savings potential and generate optimized CascadeFlow configurations automatically
  • User Tier Management - Cost controls and limits per user tier with advanced routing
  • Semantic Quality Validators - Optional lightweight local quality scoring (200MB CPU model, no external API calls)
  • Code Complexity Detection - Dynamic cascading based on task complexity analysis
  • Domain Aware Cascading - Multi-stage pipelines tailored to specific domains
  • Benchmark Reports - Automated performance and cost benchmarking

Support


Citation

If you use CascadeFlow in your research or project, please cite:

@software{cascadeflow2025,
  author = {Lemony Inc., Sascha Buehrle and Contributors},
  title = {CascadeFlow: Smart AI model cascading for cost optimization},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/lemony-ai/cascadeflow}
}

Ready to cut your AI costs by 40-85%?

pip install cascadeflow
npm install @cascadeflow/core

Read the Docs โ€ข View Python Examples โ€ข View TypeScript Examples โ€ข Join Discussions


About

Built with โค๏ธ by Lemony Inc. and the CascadeFlow Community

One cascade. Hundreds of specialists.

New York | Zurich

โญ Star us on GitHub if CascadeFlow helps you save money!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cascadeflow-0.1.1.tar.gz (98.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cascadeflow-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file cascadeflow-0.1.1.tar.gz.

File metadata

  • Download URL: cascadeflow-0.1.1.tar.gz
  • Upload date:
  • Size: 98.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cascadeflow-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5d185cc0352dcda8bd1833420b3e2aee52a306591c535292a9da7e155be00443
MD5 373679aa6ee4af3d5a259e0c724f5d68
BLAKE2b-256 52550ada67d8fd06b2ac7c75f95797464d8b7de8eb34f3f243f29c8486f426a2

See more details on using hashes here.

File details

Details for the file cascadeflow-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cascadeflow-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cascadeflow-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b42aaef9f396bfe99b0365f134e5e484ae36cbb465c603a2ecf1e062ac76a313
MD5 a2295f56ab9fa1bd7c96caa6f1ba6a8b
BLAKE2b-256 f2589e2498a830b31a2b952c246862c928ae3aef0e5082d648bc748da49684c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page