Skip to main content

Slash LLM costs with intelligent context compression, smart routing, and cost tracking

Project description

TokenPak — Cut your LLM token spend by 30–50%, zero config

PyPI version Python 3.10+ License: Apache 2.0

TokenPak is a local proxy that compresses your LLM context before it hits the API — fewer tokens, lower cost, same results. No code changes, no cloud, no credentials stored.

Status: early preview. Core compression engine and proxy are in place. Per-client auto-integration (the tokenpak integrate command) is not yet shipped — configure your client manually by pointing it at http://127.0.0.1:8766. See QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart).


Quick start

pip install tokenpak
tokenpak start                      # start the local proxy at 127.0.0.1:8766

Point your LLM client at the proxy. For example, the Anthropic SDK:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8766

Or for OpenAI-compatible clients:

export OPENAI_BASE_URL=http://127.0.0.1:8766

Then use your client normally. TokenPak compresses requests on the way out and logs savings to a local SQLite ledger.

See QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) for per-client setup (Claude Code, Cursor, Aider, and others).


What savings look like

After a few proxied requests, tokenpak savings reports the cumulative reduction:

┌──────────────────────────────────────────────────────┐
│  TokenPak — Savings                                  │
├──────────────────────────────────────────────────────┤
│  Sample scenario       DevOps agent (config + logs)  │
│  Savings drivers                      dedup + alias  │
├──────────────────────────────────────────────────────┤
│  Original                                747 tokens  │
│  Compressed                              502 tokens  │
│  Saved                          245 tokens  (32.8%)  │
│  Cost saved (est.)                $0.00073 per call  │
├──────────────────────────────────────────────────────┤
│  Stages: dedup, alias, segmentize, directives        │
└──────────────────────────────────────────────────────┘

Actual numbers depend on your workload. Agent-style prompts with lots of repeated context see the biggest gains.


Works with

Any LLM client that respects a custom base URL:

Claude Code · Cursor · Cline · Continue.dev · Aider · OpenAI SDK · Anthropic SDK · LiteLLM · Codex

Per-client configuration steps are in QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart). Auto-wiring via a single tokenpak integrate <client> command is tracked for a future release.


Install

pip install tokenpak

TokenPak's runtime dependencies include anthropic, openai, fastapi, flask, litellm, llmlingua, pandas, pydantic, requests, rich, scipy, sentence-transformers, tree-sitter-languages, watchdog, and a few others — all installed automatically. Note that sentence-transformers and scipy are large (several hundred MB of dependencies); expect pip install to take a few minutes on first install.

Requires Python 3.10+.

See QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) for virtual-env setup and first-run details.


What's included

  • Context compression — deterministic pipeline (dedup → alias → segmentize → directives); typical 30–50% token reduction on agent workloads.
  • Local proxy — runs at 127.0.0.1:8766; zero cloud component.
  • Model routing — configurable rules with fallback chains.
  • Cost & savings tracking — per model, per session, per agent; local SQLite (~/.tokenpak/monitor.db).
  • Dashboard — local web UI for visualizing savings (tokenpak dashboard).
  • Vault indexing + semantic search — index a directory; search without an LLM call.
  • A/B testing and request replay — compare compression configs; re-run past requests.
  • 50 built-in compression recipes — YAML, customizable.

See QUICKSTART at https://github.com/tokenpak/docs (rendered at tokenpak.ai/quickstart) and API reference at https://github.com/tokenpak/docs (rendered at tokenpak.ai/api) to get started.


Current limitations

Honest about what isn't ready yet:

  • No tokenpak integrate <client> auto-wire command — configure clients by env var as shown above. Auto-wire is planned.
  • No published CI/CD — releases are manual; automation is tracked in the release-workflow standards.
  • tokenpak demo is a compression-recipes demo (shows recipes applied to a sample input), not the decorated savings panel above. The panel shows what tokenpak savings output can look like after real usage.

We'd rather ship an honest preview than an advertised product that doesn't match install-time reality.


Support


License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenpak-1.3.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenpak-1.3.0-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file tokenpak-1.3.0.tar.gz.

File metadata

  • Download URL: tokenpak-1.3.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokenpak-1.3.0.tar.gz
Algorithm Hash digest
SHA256 ca957cd6625685a766715bdfbb9f43f5b41da8138de0001536b754ab2b0f301e
MD5 1b4380772e26c2010fc6283a3e80c65e
BLAKE2b-256 ebebf1ec4931cb5a3c6f144b02151206fbbca0a6cf7139e47d99969ee7af4d89

See more details on using hashes here.

File details

Details for the file tokenpak-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: tokenpak-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokenpak-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09967e4e7fd4133dbb622bb062761140225e7c2f95a0cfb07a5fbb6642e5c397
MD5 5a4f60377409de77ddda4e2d921ae132
BLAKE2b-256 005983f1cd33e41db0291fe2418c3490a7108b8bfb1204cd0c8a6b9f6eeab37d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page