The CDN for AI inference costs. 93% token reduction proven on OpenAI API.

These details have not been verified by PyPI

Project links

Project description

State Pack

The CDN for AI inference costs.

Agents pay per token. State Pack makes that cost invisible — the same way BlackBerry made per-character SMS costs invisible. Not by changing the model. Not by changing the API. By caching state at the infrastructure layer.

Proven on the OpenAI API

Metric	Naive	State Pack	Saving
Input tokens (20-step agent loop)	18,990	1,320	93%
Cost per loop (gpt-4o-mini)	$0.00361	$0.00091	74%
Cost per loop (gpt-4o)	~$0.190	~$0.048	74%
Latency per step	—	50ms	—
Base cache hit (shared agents)	0.951s	0.003s	99%

Real numbers. Real API. Real dollars.

The Math at Scale

1,000 agents. 40-step loops. GPT-4o pricing.

	Naive	State Pack
Cost per cycle	$144.40	$36.32
Saving per cycle		$108.08
At 100 cycles/day	$14,440	$3,632
Daily saving		$10,808

If 1,000 agents share the same system prompt, the base KV cache is computed once and served to all. Agents 2-1000 pay 0 tokens for context setup.

How It Works

naive:       [system + history + delta] -> model   (cost grows every step)
state pack:  [delta only]               -> model   (cost stays flat)

CREATE - run base prompt once, serialize KV cache to content-addressed blob
INFER - load cached state, process delta tokens only, emit verifiable receipt
MERGE - fold deltas back into base on threshold (keeps savings compounding)

Every artifact is SHA-256 addressed. Every operation emits a tamper-evident receipt. Same inputs always produce same outputs. Fully auditable.

v0.2: Stateless Inference Protocol

The server is a pure function. Zero session state. Client owns the hash chain.

POST /states           -> { state_hash }
POST /infer            -> { new_state_hash, output, savings }
POST /merge            -> { new_state_hash }
POST /compact          -> { new_state_hash, steps_folded }
GET  /states/{hash}    -> { tokens, bytes, hot }
GET  /health           -> { states_hot, states_cached }

Client chains hashes:

h0 -> infer -> h1 -> infer -> h2 -> ... -> compact -> h_fresh

Server never knows what a conversation is. Same state_hash from any client always returns the same result. Infinitely horizontally scalable.

Quickstart

Prove it against your own OpenAI key

git clone https://github.com/mauludsadiq/State-Pack.git
cd State-Pack
export OPENAI_API_KEY=sk-...
PYTHONPATH=. python3 examples/openai_benchmark.py

Stateless server (v0.2)

pip install state-pack
PYTHONPATH=. python3 -m state_pack.stateless_server --store my_store --model gpt2

# Create base state
curl -X POST http://localhost:8002/states \
  -H 'Content-Type: application/json' \
  -d '{"base_text": "You are a legal research agent..."}'
# -> {"state_hash": "sha256:abc..."}

# Infer - pure function
curl -X POST http://localhost:8002/infer \
  -H 'Content-Type: application/json' \
  -d '{"state_hash": "sha256:abc...", "delta_text": "Step 1: clause affects indemnity."}'
# -> {"new_state_hash": "sha256:def...", "output": "...", "savings": {...}}

# Compact accumulated deltas into fresh base
curl -X POST http://localhost:8002/compact \
  -H 'Content-Type: application/json' \
  -d '{"state_hash": "sha256:abc...", "accumulated_deltas": ["Step 1...", "Step 2..."]}'
# -> {"new_state_hash": "sha256:ghi...", "steps_folded": 2}

Session server (high-throughput, shared base)

PYTHONPATH=. python3 -m state_pack.session_server --store my_store --model gpt2 --port 8001

Python SDK

from state_pack.llm import StatePackLLM

llm = StatePackLLM.from_pretrained('gpt2', store='my_store', merge_every=10)
llm.set_base('You are a research agent...\n\n')

for delta in steps:
    output = llm(delta)

print(llm.stats())
# tokens_saved: 17785, savings_pct: 95.31, speedup: 3.958

Architecture

state_pack/
  stateless_server.py  Pure stateless protocol (v0.2) - hash chain API
  session_server.py    In-memory KV cache, base dedup, 1000-agent scale
  server.py            HTTP API (FastAPI, 43ms/step)
  llm.py               Drop-in LLM wrapper with automatic KV reuse
  store.py             In-process packet store (no subprocess)
  serialize.py         KV cache to .pt blob (float16, 50% smaller)
  client.py            High-level SDK
  agent_loop.py        Drop-in agent loop
  openai_integration.py  Benchmark against OpenAI API

src/main.rs            Rust CLI - content addressing, receipts, protocol

Model Support

Model	Status
GPT-2	Verified
Llama (tiny)	Verified
Any HuggingFace CausalLM	Works
OpenAI API	Verified (93% token reduction)

Roadmap

Phase 1 - Python SDK (serialize, client, agent_loop)
Phase 2 - HTTP API (FastAPI, PacketStore, 43ms/step)
Phase 3 - float16 blobs (50% smaller), DynamicCache compat
Phase 4 - Session server (in-memory KV, base dedup, 99% cache hit)
OpenAI integration (93% token reduction, 74% cost reduction, live API)
v0.2 - Stateless inference protocol (hash chain, compact, pure function server)
GPU/multi-device KV portability
LangChain/LangGraph native integration
Rust HTTP server (protocol layer in Rust, Python inference sidecar)

License

MUI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

May 4, 2026

This version

0.2.0

May 4, 2026

0.1.1

May 4, 2026

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

state_pack-0.2.0-py3-none-any.whl (23.3 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file state_pack-0.2.0-py3-none-any.whl.

File metadata

Download URL: state_pack-0.2.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 23.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for state_pack-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41d19c67983961122619784aee89210cc112f32edf3daf84832dcfafa63598ef`
MD5	`aa6bcf333103e23794ac6c95996aef3f`
BLAKE2b-256	`f9c3148b65640440cee6793039a7f34be94fc113f5cbff5b6e93a6b667170baa`

See more details on using hashes here.

state-pack 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

State Pack

Proven on the OpenAI API

The Math at Scale

How It Works

v0.2: Stateless Inference Protocol

Quickstart

Prove it against your own OpenAI key

Stateless server (v0.2)

Session server (high-throughput, shared base)

Python SDK

Architecture

Model Support

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes