Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.

These details have not been verified by PyPI

Project links

Project description

`skim`

Stop paying for tokens you never meant to send.

The runtime layer that sits between your AI tools and the LLM API — stripping waste, injecting caching, and showing you exactly where every token goes.

⚡ Quickstart · 🔍 How it works · 📊 Dashboard · 🏢 Enterprise · ⌨️ CLI · 📚 Docs · ▶️ Live Demo

[!NOTE] One env var. Zero code changes. Claude Code reads a package-lock.json — 122k tokens, $0.37 — just to answer a question about a 200-line file. History compounds. Your context window fills silently and quality degrades while you fly blind. skim fixes this in the API call path, in real time.

flowchart LR
    A["🤖 Claude Code<br/>Cursor · your app"] -->|ANTHROPIC_BASE_URL| B1

    subgraph SKIM ["⚡ skim proxy"]
        direction TB
        B1["✂️ strip lock files<br/>& build artifacts"]
        B2["◈ inject prompt caching<br/>50–90% cheaper"]
        B3["🛡️ enforce budgets<br/>hard 429 block"]
        B4["📊 live dashboard<br/>+ local SQLite"]
        B1 --> B2 --> B3 --> B4
    end

    B4 --> C["☁️ Anthropic<br/>OpenAI · Gemini"]

    style A fill:#161920,stroke:#6c63ff,color:#e4e6f0
    style SKIM fill:#0d0f14,stroke:#6c63ff,color:#6c63ff
    style C fill:#161920,stroke:#00d4aa,color:#e4e6f0
    style B1 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B2 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B3 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B4 fill:#161920,stroke:#252a3a,color:#e4e6f0

⚡ Quickstart

1. Install

pip install skim-llm

2. Start the proxy

skim proxy

Browser opens automatically to your live dashboard.

3. Point your tool at it

export ANTHROPIC_API_KEY=sk-ant-...   # required for Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7474

That's it. Every call now flows through skim.

┌────────────────────────────────────┐
│  skim v0.5.0  — runtime token proxy │
├────────────────────────────────────┤
│  listening  localhost:7474          │
│  dashboard  localhost:7474/dashboard│
│  filtering  ✓ on                    │
│  caching    ✓ on                    │
├────────────────────────────────────┤
│  ⠋ LIVE  waiting for calls...       │
└────────────────────────────────────┘

[!TIP] skim auto-detects your plan — x-api-key for API users, Authorization: Bearer for OAuth clients — and routes each accordingly, with full waste filtering and tracking either way.

[!WARNING] Claude Code on a Pro/Max subscription cannot use a local proxy. Subscription traffic ignores ANTHROPIC_BASE_URL and routes straight to Anthropic — the proxy will sit on "waiting for calls". To intercept Claude Code, use API-key auth (export ANTHROPIC_API_KEY=sk-ant-… alongside ANTHROPIC_BASE_URL, in the same shell before launching claude). skim also works as-is with Cursor, the SDK, and any OpenAI-compatible tool.

🔍 How it works

✂️

Waste filtering

Detects lock files, build artifacts & generated code inside tool_result blocks and strips them before they hit your context.

package-lock.json → a 12-token note instead of 122k tokens.

◈

Caching injection

Wraps your system prompt + large context with cache_control automatically.

First call caches it. Every call after is free. CLAUDE.md loads at zero cost on calls 2+.

📊

Live dashboard

Opens in your browser on start. No login, no setup. Persists to ~/.skim/events.db.

Real-time SSE updates — watch tokens & cost as they happen.

Auto-detected waste signatures

File	Detected by
`package-lock.json`	`"lockfileVersion"` + `"resolved": "https://"`
`yarn.lock`	`# yarn lockfile v1` + `resolved`
`pnpm-lock.yaml`	`lockfileVersion:` + `resolution:`
`Cargo.lock`	`@generated` + `[[package]]`
`poetry.lock`	`@generated` + `[[package]]`
`composer.lock`	`"content-hash":` + `"packages":`

Plus anything in your project's .llmignore. Stripped blocks are replaced with a one-line note showing what was removed and how to disable it.

How plan detection works

One method, _auth_type(), owns all routing logic:

_auth_type() → ("apikey", key)    # API plan      → filtering + caching + tracking
             → ("oauth",  token)  # Pro/Max plan  → filtering + tracking (no cache injection)
             → ("", "")           # no auth       → 401

Adding a new plan type (enterprise SSO, team tokens) is a single elif. Caching injection is skipped for Pro/OAuth because the Pro plan manages its own cache layer.

📊 Dashboard

Five fully-built pages. Dark theme, live charts, real-time SSE updates — no refresh button needed.

🟣 Overview	⚡ Sessions	📈 Usage	🤖 Models	💰 Savings
tokens, cost, savings, cache	full call log, searchable	hourly + daily charts	cost/1k, cache %, waste %	cumulative savings & ROI

skim proxy              # local dashboard, zero setup, opens in browser

The local dashboard works for everyone — solo devs, Pro users, anyone. Data never leaves your machine unless you explicitly connect a team server.

🏢 Enterprise

[!IMPORTANT] Everything below is open-source and self-hosted — same pip package, no paywall, no telemetry.

🛡️ Budget enforcement

Hard-block calls that exceed token/cost limits. Proxy returns 429 before forwarding.

skim admin budget set --owner-type team \
  --owner-id engineering --usd 500 --period monthly

🔔 Webhook alerts

Slack (& Teams) or any HTTP endpoint on budget events.

skim admin webhooks add --channel slack \
  --url https://hooks.slack.com/...

✉️ User invites

Self-registration via single-use links. No manual accounts.

skim admin users invite --email new@corp.com \
  --role user --team platform

🔑 Scoped API keys

ingest · read · admin — with expiry dates and revocation.

👥 RBAC

admin · team_admin · user — enforced data isolation per role.

📋 Audit log

Every sensitive action logged immutably. Queryable by action + date.

skim admin audit --days 30 --action auth.login

📤 Data export

CSV event logs + JSON summaries for accounting & BI.

skim admin export --days 30 --out report.csv

Team deployment in 3 commands

# 1. Run the server (auto-creates admin, uses gunicorn if installed)
pip install 'skim-llm[web]'
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475

# 2. Each developer connects their proxy
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-...     # generate in Settings

# 3. Manage from anywhere
skim admin users list

Auth: local password · LDAP/AD (SKIM_LDAP_*) · Google/GitHub/Azure/Okta (SKIM_OIDC_*)

Full guide → docs/enterprise.md · docs/deployment.md

⌨️ CLI Reference

🔬 Static analysis _{no API key}

skim scan       # token cost per file
skim analyze    # detect waste patterns
skim fix        # auto-write .llmignore
skim check      # CI budget gate
skim generate   # .llmignore + CLAUDE.md
skim secrets    # leaked credential scan

⚙️ Runtime & ops

skim proxy      # the interceptor
skim server     # team dashboard + API
skim admin      # manage users/budgets/keys
skim audit      # local operation log
skim hooks      # git pre-commit gate
skim baseline   # token regression checks

Example — skim fix auto-cleanup

  skim fix  —  ./my-project
  ──────────────────────────────────────────────────────
  Before  : 166.8k tokens  (83.4% ctx)  $0.50/session

  Pattern              Severity    Tokens saved  Rules
  ────────────────────────────────────────────────────
  Lock files           HIGH           160.3k     +7
  Test snapshots       MEDIUM           4.1k     +2

  ✓ Written to .llmignore

  After   : 6.5k tokens  (3.2% ctx)  $0.02/session
  Saved   : 160.3k tokens  (96.1% reduction)  $0.48/session
  Now     : 51 sessions / $1

🐍 Python API

from adapters import ClaudeAdapter

claude = ClaudeAdapter(
    model="claude-sonnet-4-6",
    system_prompt="You are a terse coding assistant.",
    enable_caching=True,          # prompt caching, automatic
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037

_{Adapters: ClaudeAdapter · OpenAIAdapter · GeminiAdapter · OllamaAdapter}

📦 Install

pip install skim-llm                    # core — zero hard deps
pip install 'skim-llm[tiktoken]'        # accurate token counting
pip install 'skim-llm[web]'             # dashboard server
pip install 'skim-llm[web,sso,ldap]'    # enterprise auth
pip install 'skim-llm[all]'             # everything

📚 Documentation

Guide	What it covers
Quickstart	Zero to running in 2 minutes
Proxy	Deep-dive — every feature, every flag
Dashboard	Local & team dashboards
Enterprise	Budgets, webhooks, invites, RBAC, audit
Admin CLI	`skim admin` complete reference
REST API	All 31 endpoints with schemas
Configuration	Every env var & `.skimrc` option
Deployment	Docker, systemd, nginx, scaling
MCP Setup	Claude Desktop integration

🔌 MCP Server

{ "mcpServers": { "skim": { "command": "skim-mcp" } } }

_{Tools: scan_tokens · analyze_context · check_budget · fix_context · generate_llmignore}

_{GitHub · PyPI · Issues · Changelog · Live Demo
Built for developers who'd rather not pay for noise. · MIT License}

_{⭐ Star the repo if skim saved you some tokens.}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.1

May 31, 2026

0.5.0

May 31, 2026

0.3.0

May 31, 2026

0.2.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skim_llm-0.5.1.tar.gz (112.3 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skim_llm-0.5.1-py3-none-any.whl (121.5 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file skim_llm-0.5.1.tar.gz.

File metadata

Download URL: skim_llm-0.5.1.tar.gz
Upload date: May 31, 2026
Size: 112.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`a7a73ffff130986fd6fdb35e1fb7f6cf2ab75c8b166a056916596405a19bb1c4`
MD5	`b5db65477f8a49778989f2d7dc214db5`
BLAKE2b-256	`17bf00c4056944aeb106b2bbafd412a227438c3321e0202ee548a0c52d3d01ed`

See more details on using hashes here.

File details

Details for the file skim_llm-0.5.1-py3-none-any.whl.

File metadata

Download URL: skim_llm-0.5.1-py3-none-any.whl
Upload date: May 31, 2026
Size: 121.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09a43a7b095693c0f5fd64dd2339081de7cb82c8b96337c8c4014206449eae04`
MD5	`721580d70b22e1b3426208f7edf788e4`
BLAKE2b-256	`e429f0d72801bfa500158f468722baf86b8f09a2eacec5121c803a044663896c`

See more details on using hashes here.

skim-llm 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

skim

Stop paying for tokens you never meant to send.

⚡ Quickstart

🔍 How it works

✂️

◈

📊

📊 Dashboard

🏢 Enterprise

🛡️ Budget enforcement

🔔 Webhook alerts

✉️ User invites

🔑 Scoped API keys

👥 RBAC

📋 Audit log

📤 Data export

⌨️ CLI Reference

🐍 Python API

📦 Install

📚 Documentation

🔌 MCP Server

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`skim`