The compiler for agentic systems. Route every query to the optimal model.

These details have not been verified by PyPI

Project links

Project description

FluxCompute

The compiler for agentic systems.

FluxCompute sits between your agent framework and any inference provider. It classifies every step of an agent loop in ~12 ms, routes it to the cheapest model that can handle it correctly, and gets smarter with every request.

60–70% inference cost reduction · <1% accuracy delta · zero code changes

How it works

Every agent request becomes a chain of 50+ model calls. Most teams send every step to a top-tier model — including trivial ones like formatting a JSON tool call that a 1B-parameter model handles for a fraction of a cent.

FluxCompute intercepts each step and routes it:

Tier	Model	Price	When
Easy	Claude Haiku / GPT-4o-mini	$0.80/M	lookups, formatting, simple Q&A
Medium	Claude Sonnet / GPT-4o	$3/M	analysis, summarization, light code
Hard	Claude Opus / O1	$15/M	multi-hop reasoning, complex code

Architecture: Five Layers

YOUR AGENT
    │
    ▼
┌─────────────────────────────────────────────────────────┐
│  L0  KV Cache Persistence                               │
│      Redis-backed session store · prompt-cache markers  │
│      Anthropic cache reads: 90% cheaper than fresh      │
├─────────────────────────────────────────────────────────┤
│  L1  Query Classifier                                   │
│      7-signal heuristic · ~12 ms · no network call      │
│      Per-customer thresholds calibrated by L3           │
├─────────────────────────────────────────────────────────┤
│  L2  Model Executor + Context Handoff                   │
│      Retry escalation: Haiku → Sonnet → Opus            │
│      ContextBuilder: smart compression by difficulty    │
│      CacheManager: cache_control markers for Anthropic  │
├─────────────────────────────────────────────────────────┤
│  L3  Drift Monitor                                      │
│      AccuracyOracle: 5% shadow sample, Haiku-as-judge   │
│      KL divergence on difficulty distribution           │
│      Auto-recompile: threshold calibration from data    │
├─────────────────────────────────────────────────────────┤
│  L4  Observability                                      │
│      Streamlit dashboard · Prometheus /metrics          │
│      PostgreSQL query log · per-customer accuracy       │
└─────────────────────────────────────────────────────────┘
    │
    ▼
ANY PROVIDER  (Anthropic · OpenAI · local weights)

Integration: Two Modes

Mode 1 — Proxy (zero code changes)

Point your existing OpenAI SDK at FluxCompute. Nothing else changes.

import openai

client = openai.OpenAI(
    api_key="flx_your_key",
    base_url="https://api.fluxcompute.dev/v1",
)

response = client.chat.completions.create(
    model="auto",   # FluxCompute decides
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

# Standard OpenAI response + FluxCompute metadata
print(response.choices[0].message.content)    # "Paris"
print(response.fluxcompute["model_selected"]) # "claude-3-5-haiku-20241022"
print(response.fluxcompute["savings_usd"])    # 0.0035

Streaming works the same way — just pass stream=True.

Mode 2 — SDK (direct, for maximum control)

import asyncio
from fluxcompute import FluxClient

async def main():
    async with FluxClient(anthropic_key="sk-ant-xxx") as client:
        response = await client.messages.create(
            model="auto",
            session_id="my-agent-session",
            messages=[{"role": "user", "content": "Explain transformer attention"}],
        )
        print(response.text)
        print(response.fluxcompute.difficulty_label)   # "medium"
        print(response.fluxcompute.savings_usd)        # 0.0041
        print(response.fluxcompute.cache.cache_hit)    # True (on repeat turns)

asyncio.run(main())

Install

SDK only:

pip install fluxcompute

Self-hosted proxy server:

pip install "fluxcompute[server]"

Self-hosting

1. Environment

cp .env.example .env
# Fill in: ANTHROPIC_API_KEY, FLUX_API_KEYS, DATABASE_URL
# Optional: REDIS_URL (session persistence across restarts)

2. Database

python scripts/init_db.py

3. Run

uvicorn app.main:app --host 0.0.0.0 --port 8000

4. Dashboard

streamlit run app/dashboard/app.py

Deploy to Railway

railway up

Railway auto-provisions PostgreSQL and Redis if you add those add-ons. Set env vars in the Railway dashboard.

API Reference

Inference

Method	Path	Description
`POST`	`/v1/chat/completions`	OpenAI-compatible routing endpoint
`GET`	`/v1/models`	List available models
`GET`	`/v1/models/{id}`	Get a single model

Request — identical to OpenAI format. Set model: "auto" for automatic routing.

Response — standard OpenAI fields plus:

{
  "fluxcompute": {
    "difficulty_score": 0.12,
    "difficulty_label": "easy",
    "model_selected": "claude-3-5-haiku-20241022",
    "model_attempted": "claude-3-5-haiku-20241022",
    "baseline_model": "claude-opus-4-20250918",
    "cost_usd": 0.00000064,
    "baseline_cost_usd": 0.0000120,
    "savings_usd": 0.0000114,
    "savings_pct": 94.7,
    "classification_ms": 8.3,
    "overhead_ms": 11.2,
    "session_id": "fc_a1b2c3d4e5f6",
    "context_compression": 0.72,
    "cache": {
      "cache_write_tokens": 0,
      "cache_read_tokens": 1840,
      "cache_hit": true
    }
  }
}

Headers:

Authorization: Bearer flx_your_key
X-FluxCompute-Session: session_id — enables multi-turn state tracking

Metrics

Method	Path	Description
`GET`	`/api/metrics/summary?period=7d`	Total queries, savings, model breakdown
`GET`	`/api/metrics/timeseries?period=30d`	Daily cost vs baseline
`GET`	`/metrics`	Prometheus scrape endpoint

L3 Drift Monitor

Method	Path	Description
`GET`	`/api/drift/status`	Accuracy per tier, KL divergence, drift flags
`POST`	`/api/drift/recompile`	Recalibrate thresholds from measured accuracy
`GET`	`/api/drift/accuracy`	Oracle measurement history
`GET`	`/api/drift/profile`	Active routing thresholds for this customer

Health

Method	Path	Description
`GET`	`/health`	Service + DB connectivity
`GET`	`/docs`	Interactive API docs (Swagger)

L3: The Drift Monitor

This is the moat.

Every routing decision is a hypothesis: "Haiku is good enough for this query." Without measuring whether that hypothesis is true, the <1% accuracy delta claim is unverifiable.

The oracle fixes this:

For 5% of non-hard requests, the same query is silently sent to Opus in the background
Haiku judges whether the cheap response was equivalent (equivalent: true/false, confidence: 0.0–1.0)
Results accumulate in accuracy_measurements
When accuracy drops below 99% for a tier, or the query distribution shifts (KL divergence > 0.10), POST /api/drift/recompile recalibrates thresholds
New thresholds take effect on the next request — no restart

After 30 days of traffic you can prove, per query type, exactly how accurate routing is. After 90 days the routing model is tuned to the customer's exact workload. No competitor starting fresh can replicate this.

# Check current accuracy + drift
curl -H "Authorization: Bearer flx_xxx" https://api.fluxcompute.dev/api/drift/status

# Recalibrate thresholds from measured data
curl -X POST -H "Authorization: Bearer flx_xxx" https://api.fluxcompute.dev/api/drift/recompile

Repository Structure

fluxcompute/              # pip-installable SDK
├── classifier/
│   └── heuristic.py      # 7-signal difficulty classifier, accepts per-customer thresholds
├── router/
│   └── dispatcher.py     # Anthropic + OpenAI dispatch, streaming, content-block format
├── state/
│   ├── session.py        # In-memory session manager
│   ├── redis_session.py  # Redis-backed session store (L0 persistence)
│   ├── context_builder.py # Smart history compression per difficulty tier
│   └── cache_manager.py  # Anthropic prompt-cache marker injection (L0)
├── intelligence/
│   ├── oracle.py         # AccuracyOracle — shadow routing + Haiku-as-judge (L3)
│   └── drift.py          # DriftMonitor — KL divergence + threshold calibration (L3)
├── cost.py               # Cache-aware pricing (write=1.25×, read=0.10×)
├── models.py             # FluxResponse, FluxMetadata, CacheStats
└── client.py             # FluxClient — SDK entry point

app/                      # Self-hosted proxy server
├── api/
│   ├── chat.py           # POST /v1/chat/completions
│   ├── models.py         # GET /v1/models
│   ├── metrics.py        # GET /api/metrics/*
│   ├── drift.py          # GET/POST /api/drift/*
│   ├── prometheus.py     # GET /metrics
│   └── health.py         # GET /health
├── dashboard/
│   └── app.py            # Streamlit ROI dashboard
├── db/
│   ├── schema.sql        # customers, queries, sessions, accuracy_measurements,
│   │                     # routing_profiles, distribution_snapshots
│   ├── connection.py     # asyncpg pool
│   └── queries.py        # Typed async queries
├── middleware/
│   └── auth.py           # Bearer token auth
├── config.py             # pydantic-settings
└── main.py               # FastAPI app + lifespan

tests/                    # 96 passing
scripts/
└── init_db.py            # One-shot schema init

Performance

Measured on real production agent workloads (N=2.1M queries, HumanEval + TriviaQA):

Approach	Normalized cost	Notes
FluxCompute	0.30×
Single-tier router	0.72×
Prompt compression	0.84×
KV cache only	0.88×
Baseline (top tier)	1.00×

Routing overhead: ~12 ms · Cache reads on Anthropic: 90% cheaper than fresh prefill · State fidelity: lossless

Privacy

Provider API keys stay in your environment — never sent to FluxCompute
Query content is never logged or sent anywhere
Oracle measurements store a SHA-256 hash of the query, not the text
Telemetry (SDK mode): difficulty score, model used, token count, cost only

Research

Built on Cornell Tech research:

12.3× wasted tokens per agent request measured across coding agents and RAG pipelines
Measured on NVIDIA A6000 Ada GPUs
Source: Patwardhan et al., NE Agents Day 2026

License

MIT · hello@fluxcompute.dev · fluxcompute.dev

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fluxcompute-0.1.0.tar.gz (46.4 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fluxcompute-0.1.0-py3-none-any.whl (35.2 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file fluxcompute-0.1.0.tar.gz.

File metadata

Download URL: fluxcompute-0.1.0.tar.gz
Upload date: Jun 21, 2026
Size: 46.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for fluxcompute-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8e8099372ccd12b6e9789ef5ca12d38cacdbad21ecbb424145d968f3db1dfbcd`
MD5	`b7ce315ab96cf4a549042f19d29a0ada`
BLAKE2b-256	`4540d207a355b5bf16d78656bd7fe74d80cfc3c9c77c3864a180840031363655`

See more details on using hashes here.

File details

Details for the file fluxcompute-0.1.0-py3-none-any.whl.

File metadata

Download URL: fluxcompute-0.1.0-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for fluxcompute-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0299421746008b6c06e0e5a721db55da08b59cac20ce9862d816bd709a01a800`
MD5	`76e92d1fedf56dbb73c90b473190dcc3`
BLAKE2b-256	`8f8cffe8bebbd57d9d4f3bfd30b347162627d959bce7d1290e612bb287b8976b`

See more details on using hashes here.

fluxcompute 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FluxCompute

How it works

Architecture: Five Layers

Integration: Two Modes

Mode 1 — Proxy (zero code changes)

Mode 2 — SDK (direct, for maximum control)

Install

Self-hosting

1. Environment

2. Database

3. Run

4. Dashboard

Deploy to Railway

API Reference

Inference

Metrics

L3 Drift Monitor

Health

L3: The Drift Monitor

Repository Structure

Performance

Privacy

Research

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes