Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.

These details have not been verified by PyPI

Project description

ThriftAI

Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.

ThriftAI sits between your orchestration layer (LangGraph, CrewAI, AutoGen, or raw Python) and your LLM provider. It intercepts every call to prevent redundant spend — transparently, without requiring you to change your pipeline logic.

ThriftAI is not an observability tool, a tracing platform, or an LLM gateway. Tools like MLflow, Langfuse, and Braintrust already do those jobs well. ThriftAI solves the problem they don't: making the next pipeline run cheaper based on what the last run produced.

The Problem

Developing multi-agent LLM pipelines is expensive because:

Redundant calls — tweaking one agent's prompt re-runs the entire pipeline, paying for all unchanged agents
Iteration loops — prompt engineering is trial-and-error; each experiment is a full API round-trip
No selective re-execution — you can't iterate on agent 3 without re-paying for agents 1 and 2

Quick Start

pip install thriftai

import thriftai as ta

@ta.agent(name="researcher")
def research(session, topic):
    return session.completion(
        messages=[{"role": "user", "content": f"Research: {topic}"}],
        model="anthropic/claude-sonnet-4-20250514",
    )

@ta.agent(name="writer", depends_on=["researcher"])
def write(session, research):
    return session.completion(
        messages=[{"role": "user", "content": f"Summarize: {research}"}],
        model="anthropic/claude-sonnet-4-20250514",
    )

session = ta.Session()

# Run 1: both agents go live — $0.43
with session.run() as run:
    data = research(run, "AI costs")
    summary = write(run, data)

# Run 2: only writer goes live, researcher replays from trace — $0.07
with session.replay(trace_id=run.trace_id, live=["writer"]) as run:
    data = research(run, "AI costs")
    summary = write(run, data)
    print(run.cost_report.summary())

ThriftAI Cost Report
──────────────────────────────────────────────────
  researcher           [replay]     $0.0000  (saved $0.3600)
  writer               [live]       $0.0700  (saved $0.0000)
──────────────────────────────────────────────────
  Total cost:  $0.0700
  Total saved: $0.3600
  Savings:     84%

How It Works

ThriftAI uses a decision cascade for every LLM call:

Replay check → Is this agent being replayed? Serve exact output from trace.
Cache check → Is there an exact-match hit? Serve cached response.
Live call → Route to LLM. Record in cache and trace. Track cost.

Features

Selective replay: Replay N-1 agents from trace, send 1 live
Exact-match cache: Hash-based, scoped per agent + prompt template
Downstream invalidation: If a live agent's output changes during replay, dependents auto-invalidate
Cost-saved metric: Reports what you saved, not just what you spent
Provider-agnostic: Works with any provider via LiteLLM (Anthropic, OpenAI, Google, etc.)
Zero lock-in: Decorator/wrapper pattern — keep your existing pipeline code

Should I use ThriftAI in production?

Short answer: probably yes, with caveats. Use it where inputs recur, disable it where every call is unique.

When it pays off

Batch / scheduled agent pipelines. Nightly summarization, weekly research bots, daily reports — same inputs recur. Cache hit rates often >50%.
Eval and benchmark loops. Re-running the same prompts across models. Hit rate ≈100% after the first pass.
RAG with long-tail recurrence. Many users asking the same questions of your docs.

When it's a net loss

Interactive user-facing chat where every prompt is unique. Cache hit rate ≈0; you pay storage + lookup overhead for nothing.
Cheap models with cheap embeddings. Below a per-call cost threshold, semantic caching costs more than it saves. See STRESS_REPORT.md for the per-model break-even table and the wrong-hit risk per query category.
Hard p99 SLAs. SQLite writes add 1–2 ms; measure first.

Replay is dev-only

Session.replay() exists for prompt iteration during development. It has no production use; calling it with enabled=False raises.

Kill switch

Two equivalent ways to disable cache + replay (cost tracking stays on):

session = Session(enabled=False)             # per-session

THRIFTAI_DISABLED=1 python my_app.py         # global, wins over the kwarg

When disabled, Session is a thin pass-through to LiteLLM. No filesystem writes, no embedding calls, no traces. CostReport still summarizes per-agent spend.

Open production gaps

Be transparent about what's not solved yet:

No TTL on cached responses. Invalidate manually with cache.invalidate_agent(name) after a model upgrade or data refresh.
Single-instance cache. Each replica has its own SQLite. Use a shared volume, or wait for the planned Redis backend.
Response text stored unencrypted. If agent inputs are sensitive, encrypt the cache directory at rest or run with enabled=False until the planned PII-redaction layer lands.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thriftai-0.1.1.tar.gz (52.2 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

thriftai-0.1.1-py3-none-any.whl (26.9 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file thriftai-0.1.1.tar.gz.

File metadata

Download URL: thriftai-0.1.1.tar.gz
Upload date: May 18, 2026
Size: 52.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thriftai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b73bc2fb4da7593885d2423982cd2e7bf6d5e1e749b7341ab4dbaeb3c64423b3`
MD5	`fbb1495cb1fc7313c475a7598b0ee3b7`
BLAKE2b-256	`1e1fcba4385523cd40d7c249143ca59e2630563bebe8d55077fb843d4cbfd1a4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for thriftai-0.1.1.tar.gz:

Publisher: release.yml on rayabhik83/thriftai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: thriftai-0.1.1.tar.gz
- Subject digest: b73bc2fb4da7593885d2423982cd2e7bf6d5e1e749b7341ab4dbaeb3c64423b3
- Sigstore transparency entry: 1565987630
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: rayabhik83/thriftai@84cbda8a62ffcf46bd11f09b805350c6381805d4
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/rayabhik83
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@84cbda8a62ffcf46bd11f09b805350c6381805d4
- Trigger Event: push

File details

Details for the file thriftai-0.1.1-py3-none-any.whl.

File metadata

Download URL: thriftai-0.1.1-py3-none-any.whl
Upload date: May 18, 2026
Size: 26.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thriftai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6accbf6c343e2ff954210cc67a54860bc3b3f9c33834bffa64f655018d829293`
MD5	`613c5467a69550eaca15eabf0369b681`
BLAKE2b-256	`92f767a50e81fc1c1e34204fba2279494c871682a73c33dd248c846a0c613ddf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for thriftai-0.1.1-py3-none-any.whl:

Publisher: release.yml on rayabhik83/thriftai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: thriftai-0.1.1-py3-none-any.whl
- Subject digest: 6accbf6c343e2ff954210cc67a54860bc3b3f9c33834bffa64f655018d829293
- Sigstore transparency entry: 1565987652
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: rayabhik83/thriftai@84cbda8a62ffcf46bd11f09b805350c6381805d4
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/rayabhik83
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@84cbda8a62ffcf46bd11f09b805350c6381805d4
- Trigger Event: push

thriftai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ThriftAI

The Problem

Quick Start

How It Works

Features

Should I use ThriftAI in production?

When it pays off

When it's a net loss

Replay is dev-only

Kill switch

Open production gaps

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance