Skip to main content

Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.

Project description

ThriftAI

Make multi-agent LLM development cheaper. Cache, replay, and tier — without changing your pipeline code.

ThriftAI sits between your orchestration layer (LangGraph, CrewAI, AutoGen, or raw Python) and your LLM provider. It intercepts every call to prevent redundant spend — transparently, without requiring you to change your pipeline logic.

ThriftAI is not an observability tool, a tracing platform, or an LLM gateway. Tools like MLflow, Langfuse, and Braintrust already do those jobs well. ThriftAI solves the problem they don't: making the next pipeline run cheaper based on what the last run produced.

The Problem

Developing multi-agent LLM pipelines is expensive because:

  • Redundant calls — tweaking one agent's prompt re-runs the entire pipeline, paying for all unchanged agents
  • Iteration loops — prompt engineering is trial-and-error; each experiment is a full API round-trip
  • No selective re-execution — you can't iterate on agent 3 without re-paying for agents 1 and 2

Quick Start

pip install thriftai
import thriftai as ta

@ta.agent(name="researcher")
def research(session, topic):
    return session.completion(
        messages=[{"role": "user", "content": f"Research: {topic}"}],
        model="anthropic/claude-sonnet-4-20250514",
    )

@ta.agent(name="writer", depends_on=["researcher"])
def write(session, research):
    return session.completion(
        messages=[{"role": "user", "content": f"Summarize: {research}"}],
        model="anthropic/claude-sonnet-4-20250514",
    )

session = ta.Session()

# Run 1: both agents go live — $0.43
with session.run() as run:
    data = research(run, "AI costs")
    summary = write(run, data)

# Run 2: only writer goes live, researcher replays from trace — $0.07
with session.replay(trace_id=run.trace_id, live=["writer"]) as run:
    data = research(run, "AI costs")
    summary = write(run, data)
    print(run.cost_report.summary())
ThriftAI Cost Report
──────────────────────────────────────────────────
  researcher           [replay]     $0.0000  (saved $0.3600)
  writer               [live]       $0.0700  (saved $0.0000)
──────────────────────────────────────────────────
  Total cost:  $0.0700
  Total saved: $0.3600
  Savings:     84%

How It Works

ThriftAI uses a decision cascade for every LLM call:

  1. Replay check → Is this agent being replayed? Serve exact output from trace.
  2. Cache check → Is there an exact-match hit? Serve cached response.
  3. Live call → Route to LLM. Record in cache and trace. Track cost.

Features

  • Selective replay: Replay N-1 agents from trace, send 1 live
  • Exact-match cache: Hash-based, scoped per agent + prompt template
  • Downstream invalidation: If a live agent's output changes during replay, dependents auto-invalidate
  • Cost-saved metric: Reports what you saved, not just what you spent
  • Provider-agnostic: Works with any provider via LiteLLM (Anthropic, OpenAI, Google, etc.)
  • Zero lock-in: Decorator/wrapper pattern — keep your existing pipeline code

Should I use ThriftAI in production?

Short answer: probably yes, with caveats. Use it where inputs recur, disable it where every call is unique.

When it pays off

  • Batch / scheduled agent pipelines. Nightly summarization, weekly research bots, daily reports — same inputs recur. Cache hit rates often >50%.
  • Eval and benchmark loops. Re-running the same prompts across models. Hit rate ≈100% after the first pass.
  • RAG with long-tail recurrence. Many users asking the same questions of your docs.

When it's a net loss

  • Interactive user-facing chat where every prompt is unique. Cache hit rate ≈0; you pay storage + lookup overhead for nothing.
  • Cheap models with cheap embeddings. Below a per-call cost threshold, semantic caching costs more than it saves. See STRESS_REPORT.md for the per-model break-even table and the wrong-hit risk per query category.
  • Hard p99 SLAs. SQLite writes add 1–2 ms; measure first.

Replay is dev-only

Session.replay() exists for prompt iteration during development. It has no production use; calling it with enabled=False raises.

Kill switch

Two equivalent ways to disable cache + replay (cost tracking stays on):

session = Session(enabled=False)             # per-session
THRIFTAI_DISABLED=1 python my_app.py         # global, wins over the kwarg

When disabled, Session is a thin pass-through to LiteLLM. No filesystem writes, no embedding calls, no traces. CostReport still summarizes per-agent spend.

Open production gaps

Be transparent about what's not solved yet:

  • No TTL on cached responses. Invalidate manually with cache.invalidate_agent(name) after a model upgrade or data refresh.
  • Single-instance cache. Each replica has its own SQLite. Use a shared volume, or wait for the planned Redis backend.
  • Response text stored unencrypted. If agent inputs are sensitive, encrypt the cache directory at rest or run with enabled=False until the planned PII-redaction layer lands.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thriftai-0.1.1.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thriftai-0.1.1-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file thriftai-0.1.1.tar.gz.

File metadata

  • Download URL: thriftai-0.1.1.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thriftai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b73bc2fb4da7593885d2423982cd2e7bf6d5e1e749b7341ab4dbaeb3c64423b3
MD5 fbb1495cb1fc7313c475a7598b0ee3b7
BLAKE2b-256 1e1fcba4385523cd40d7c249143ca59e2630563bebe8d55077fb843d4cbfd1a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for thriftai-0.1.1.tar.gz:

Publisher: release.yml on rayabhik83/thriftai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thriftai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: thriftai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thriftai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6accbf6c343e2ff954210cc67a54860bc3b3f9c33834bffa64f655018d829293
MD5 613c5467a69550eaca15eabf0369b681
BLAKE2b-256 92f767a50e81fc1c1e34204fba2279494c871682a73c33dd248c846a0c613ddf

See more details on using hashes here.

Provenance

The following attestation bundles were made for thriftai-0.1.1-py3-none-any.whl:

Publisher: release.yml on rayabhik83/thriftai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page