Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.
Project description
ThriftAI
Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.
ThriftAI sits between your orchestration layer (LangGraph, CrewAI, AutoGen, or raw Python) and your LLM provider. It intercepts every call to prevent redundant spend — transparently, without requiring you to change your pipeline logic.
ThriftAI is not an observability tool, a tracing platform, or an LLM gateway. Tools like MLflow, Langfuse, and Braintrust already do those jobs well. ThriftAI solves the problem they don't: making the next pipeline run cheaper based on what the last run produced.
The Problem
Developing multi-agent LLM pipelines is expensive because:
- Redundant calls — tweaking one agent's prompt re-runs the entire pipeline, paying for all unchanged agents
- Iteration loops — prompt engineering is trial-and-error; each experiment is a full API round-trip
- No selective re-execution — you can't iterate on agent 3 without re-paying for agents 1 and 2
Quick Start
pip install thriftai
import thriftai as ta
@ta.agent(name="researcher")
def research(session, topic):
return session.completion(
messages=[{"role": "user", "content": f"Research: {topic}"}],
model="anthropic/claude-sonnet-4-20250514",
)
@ta.agent(name="writer", depends_on=["researcher"])
def write(session, research):
return session.completion(
messages=[{"role": "user", "content": f"Summarize: {research}"}],
model="anthropic/claude-sonnet-4-20250514",
)
session = ta.Session()
# Run 1: both agents go live — $0.43
with session.run() as run:
data = research(run, "AI costs")
summary = write(run, data)
# Run 2: only writer goes live, researcher replays from trace — $0.07
with session.replay(trace_id=run.trace_id, live=["writer"]) as run:
data = research(run, "AI costs")
summary = write(run, data)
print(run.cost_report.summary())
ThriftAI Cost Report
──────────────────────────────────────────────────
researcher [replay] $0.0000 (saved $0.3600)
writer [live] $0.0700 (saved $0.0000)
──────────────────────────────────────────────────
Total cost: $0.0700
Total saved: $0.3600
Savings: 84%
How It Works
ThriftAI uses a decision cascade for every LLM call:
- Replay check → Is this agent being replayed? Serve exact output from trace.
- Cache check → Is there an exact-match hit? Serve cached response.
- Live call → Route to LLM. Record in cache and trace. Track cost.
Features
- Selective replay: Replay N-1 agents from trace, send 1 live
- Exact-match cache: Hash-based, scoped per agent + prompt template
- Downstream invalidation: If a live agent's output changes during replay, dependents auto-invalidate
- Cost-saved metric: Reports what you saved, not just what you spent
- Provider-agnostic: Works with any provider via LiteLLM (Anthropic, OpenAI, Google, etc.)
- Zero lock-in: Decorator/wrapper pattern — keep your existing pipeline code
Should I use ThriftAI in production?
Short answer: probably yes, with caveats. Use it where inputs recur, disable it where every call is unique.
When it pays off
- Batch / scheduled agent pipelines. Nightly summarization, weekly research bots, daily reports — same inputs recur. Cache hit rates often >50%.
- Eval and benchmark loops. Re-running the same prompts across models. Hit rate ≈100% after the first pass.
- RAG with long-tail recurrence. Many users asking the same questions of your docs.
When it's a net loss
- Interactive user-facing chat where every prompt is unique. Cache hit rate ≈0; you pay storage + lookup overhead for nothing.
- Cheap models with cheap embeddings. Below a per-call cost threshold, semantic caching costs more than it saves. See STRESS_REPORT.md for the per-model break-even table and the wrong-hit risk per query category.
- Hard p99 SLAs. SQLite writes add 1–2 ms; measure first.
Replay is dev-only
Session.replay() exists for prompt iteration during development. It has no production use; calling it with enabled=False raises.
Kill switch
Two equivalent ways to disable cache + replay (cost tracking stays on):
session = Session(enabled=False) # per-session
THRIFTAI_DISABLED=1 python my_app.py # global, wins over the kwarg
When disabled, Session is a thin pass-through to LiteLLM. No filesystem writes, no embedding calls, no traces. CostReport still summarizes per-agent spend.
Open production gaps
Be transparent about what's not solved yet:
- No TTL on cached responses. Invalidate manually with
cache.invalidate_agent(name)after a model upgrade or data refresh. - Single-instance cache. Each replica has its own SQLite. Use a shared volume, or wait for the planned Redis backend.
- Response text stored unencrypted. If agent inputs are sensitive, encrypt the cache directory at rest or run with
enabled=Falseuntil the planned PII-redaction layer lands.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thriftai-0.1.1.tar.gz.
File metadata
- Download URL: thriftai-0.1.1.tar.gz
- Upload date:
- Size: 52.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b73bc2fb4da7593885d2423982cd2e7bf6d5e1e749b7341ab4dbaeb3c64423b3
|
|
| MD5 |
fbb1495cb1fc7313c475a7598b0ee3b7
|
|
| BLAKE2b-256 |
1e1fcba4385523cd40d7c249143ca59e2630563bebe8d55077fb843d4cbfd1a4
|
Provenance
The following attestation bundles were made for thriftai-0.1.1.tar.gz:
Publisher:
release.yml on rayabhik83/thriftai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thriftai-0.1.1.tar.gz -
Subject digest:
b73bc2fb4da7593885d2423982cd2e7bf6d5e1e749b7341ab4dbaeb3c64423b3 - Sigstore transparency entry: 1565987630
- Sigstore integration time:
-
Permalink:
rayabhik83/thriftai@84cbda8a62ffcf46bd11f09b805350c6381805d4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rayabhik83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@84cbda8a62ffcf46bd11f09b805350c6381805d4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file thriftai-0.1.1-py3-none-any.whl.
File metadata
- Download URL: thriftai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6accbf6c343e2ff954210cc67a54860bc3b3f9c33834bffa64f655018d829293
|
|
| MD5 |
613c5467a69550eaca15eabf0369b681
|
|
| BLAKE2b-256 |
92f767a50e81fc1c1e34204fba2279494c871682a73c33dd248c846a0c613ddf
|
Provenance
The following attestation bundles were made for thriftai-0.1.1-py3-none-any.whl:
Publisher:
release.yml on rayabhik83/thriftai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thriftai-0.1.1-py3-none-any.whl -
Subject digest:
6accbf6c343e2ff954210cc67a54860bc3b3f9c33834bffa64f655018d829293 - Sigstore transparency entry: 1565987652
- Sigstore integration time:
-
Permalink:
rayabhik83/thriftai@84cbda8a62ffcf46bd11f09b805350c6381805d4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/rayabhik83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@84cbda8a62ffcf46bd11f09b805350c6381805d4 -
Trigger Event:
push
-
Statement type: