Skip to main content

Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.

Project description

skim

skim

Stop paying for tokens you never meant to send.

The runtime layer that sits between your AI tools and the LLM API — stripping waste, injecting caching, and showing you exactly where every token goes.


PyPI Downloads Python License Zero deps


⚡ Quickstart  ·  🔍 How it works  ·  📊 Dashboard  ·  🏢 Enterprise  ·  ⌨️ CLI  ·  📚 Docs  ·  ▶️ Live Demo


[!NOTE] One env var. Zero code changes. Claude Code reads a package-lock.json — 122k tokens, $0.37 — just to answer a question about a 200-line file. History compounds. Your context window fills silently and quality degrades while you fly blind. skim fixes this in the API call path, in real time.


flowchart LR
    A["🤖 Claude Code<br/>Cursor · your app"] -->|ANTHROPIC_BASE_URL| B1

    subgraph SKIM ["⚡ skim proxy"]
        direction TB
        B1["✂️ strip lock files<br/>& build artifacts"]
        B2["◈ inject prompt caching<br/>50–90% cheaper"]
        B3["🛡️ enforce budgets<br/>hard 429 block"]
        B4["📊 live dashboard<br/>+ local SQLite"]
        B1 --> B2 --> B3 --> B4
    end

    B4 --> C["☁️ Anthropic<br/>OpenAI · Gemini"]

    style A fill:#161920,stroke:#6c63ff,color:#e4e6f0
    style SKIM fill:#0d0f14,stroke:#6c63ff,color:#6c63ff
    style C fill:#161920,stroke:#00d4aa,color:#e4e6f0
    style B1 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B2 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B3 fill:#161920,stroke:#252a3a,color:#e4e6f0
    style B4 fill:#161920,stroke:#252a3a,color:#e4e6f0

⚡ Quickstart

1. Install

pip install skim-llm

2. Start the proxy

skim proxy

Browser opens automatically to your live dashboard.

3. Point your tool at it

export ANTHROPIC_API_KEY=sk-ant-...   # required for Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7474

That's it. Every call now flows through skim.

┌────────────────────────────────────┐
│  skim v0.5.0  — runtime token proxy │
├────────────────────────────────────┤
│  listening  localhost:7474          │
│  dashboard  localhost:7474/dashboard│
│  filtering  ✓ on                    │
│  caching    ✓ on                    │
├────────────────────────────────────┤
│  ⠋ LIVE  waiting for calls...       │
└────────────────────────────────────┘

[!TIP] skim auto-detects your planx-api-key for API users, Authorization: Bearer for OAuth clients — and routes each accordingly, with full waste filtering and tracking either way.

[!WARNING] Claude Code on a Pro/Max subscription cannot use a local proxy. Subscription traffic ignores ANTHROPIC_BASE_URL and routes straight to Anthropic — the proxy will sit on "waiting for calls". To intercept Claude Code, use API-key auth (export ANTHROPIC_API_KEY=sk-ant-… alongside ANTHROPIC_BASE_URL, in the same shell before launching claude). skim also works as-is with Cursor, the SDK, and any OpenAI-compatible tool.


🔍 How it works

✂️

Waste filtering

Detects lock files, build artifacts & generated code inside tool_result blocks and strips them before they hit your context.

package-lock.json → a 12-token note instead of 122k tokens.

Caching injection

Wraps your system prompt + large context with cache_control automatically.

First call caches it. Every call after is free. CLAUDE.md loads at zero cost on calls 2+.

📊

Live dashboard

Opens in your browser on start. No login, no setup. Persists to ~/.skim/events.db.

Real-time SSE updates — watch tokens & cost as they happen.

Auto-detected waste signatures
File Detected by
package-lock.json "lockfileVersion" + "resolved": "https://"
yarn.lock # yarn lockfile v1 + resolved
pnpm-lock.yaml lockfileVersion: + resolution:
Cargo.lock @generated + [[package]]
poetry.lock @generated + [[package]]
composer.lock "content-hash": + "packages":

Plus anything in your project's .llmignore. Stripped blocks are replaced with a one-line note showing what was removed and how to disable it.

How plan detection works

One method, _auth_type(), owns all routing logic:

_auth_type()  ("apikey", key)    # API plan      → filtering + caching + tracking
              ("oauth",  token)  # Pro/Max plan  → filtering + tracking (no cache injection)
              ("", "")           # no auth       → 401

Adding a new plan type (enterprise SSO, team tokens) is a single elif. Caching injection is skipped for Pro/OAuth because the Pro plan manages its own cache layer.


📊 Dashboard

Five fully-built pages. Dark theme, live charts, real-time SSE updates — no refresh button needed.

🟣 Overview ⚡ Sessions 📈 Usage 🤖 Models 💰 Savings
tokens, cost,
savings, cache
full call log,
searchable
hourly +
daily charts
cost/1k,
cache %, waste %
cumulative
savings & ROI
skim proxy              # local dashboard, zero setup, opens in browser

The local dashboard works for everyone — solo devs, Pro users, anyone. Data never leaves your machine unless you explicitly connect a team server.


🏢 Enterprise

[!IMPORTANT] Everything below is open-source and self-hosted — same pip package, no paywall, no telemetry.

🛡️ Budget enforcement

Hard-block calls that exceed token/cost limits. Proxy returns 429 before forwarding.

skim admin budget set --owner-type team \
  --owner-id engineering --usd 500 --period monthly

🔔 Webhook alerts

Slack (& Teams) or any HTTP endpoint on budget events.

skim admin webhooks add --channel slack \
  --url https://hooks.slack.com/...

✉️ User invites

Self-registration via single-use links. No manual accounts.

skim admin users invite --email new@corp.com \
  --role user --team platform

🔑 Scoped API keys

ingest · read · admin — with expiry dates and revocation.

👥 RBAC

admin · team_admin · user — enforced data isolation per role.

📋 Audit log

Every sensitive action logged immutably. Queryable by action + date.

skim admin audit --days 30 --action auth.login

📤 Data export

CSV event logs + JSON summaries for accounting & BI.

skim admin export --days 30 --out report.csv
Team deployment in 3 commands
# 1. Run the server (auto-creates admin, uses gunicorn if installed)
pip install 'skim-llm[web]'
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475

# 2. Each developer connects their proxy
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-...     # generate in Settings

# 3. Manage from anywhere
skim admin users list

Auth: local password · LDAP/AD (SKIM_LDAP_*) · Google/GitHub/Azure/Okta (SKIM_OIDC_*)

Full guide → docs/enterprise.md · docs/deployment.md


⌨️ CLI Reference

🔬 Static analysis  no API key

skim scan       # token cost per file
skim analyze    # detect waste patterns
skim fix        # auto-write .llmignore
skim check      # CI budget gate
skim generate   # .llmignore + CLAUDE.md
skim secrets    # leaked credential scan

⚙️ Runtime & ops

skim proxy      # the interceptor
skim server     # team dashboard + API
skim admin      # manage users/budgets/keys
skim audit      # local operation log
skim hooks      # git pre-commit gate
skim baseline   # token regression checks
Example — skim fix auto-cleanup
  skim fix  —  ./my-project
  ──────────────────────────────────────────────────────
  Before  : 166.8k tokens  (83.4% ctx)  $0.50/session

  Pattern              Severity    Tokens saved  Rules
  ────────────────────────────────────────────────────
  Lock files           HIGH           160.3k     +7
  Test snapshots       MEDIUM           4.1k     +2

  ✓ Written to .llmignore

  After   : 6.5k tokens  (3.2% ctx)  $0.02/session
  Saved   : 160.3k tokens  (96.1% reduction)  $0.48/session
  Now     : 51 sessions / $1

🐍 Python API

from adapters import ClaudeAdapter

claude = ClaudeAdapter(
    model="claude-sonnet-4-6",
    system_prompt="You are a terse coding assistant.",
    enable_caching=True,          # prompt caching, automatic
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037

Adapters: ClaudeAdapter · OpenAIAdapter · GeminiAdapter · OllamaAdapter


📦 Install

pip install skim-llm                    # core — zero hard deps
pip install 'skim-llm[tiktoken]'        # accurate token counting
pip install 'skim-llm[web]'             # dashboard server
pip install 'skim-llm[web,sso,ldap]'    # enterprise auth
pip install 'skim-llm[all]'             # everything

📚 Documentation

Guide What it covers
Quickstart Zero to running in 2 minutes
Proxy Deep-dive — every feature, every flag
Dashboard Local & team dashboards
Enterprise Budgets, webhooks, invites, RBAC, audit
Admin CLI skim admin complete reference
REST API All 31 endpoints with schemas
Configuration Every env var & .skimrc option
Deployment Docker, systemd, nginx, scaling
MCP Setup Claude Desktop integration

🔌 MCP Server

{ "mcpServers": { "skim": { "command": "skim-mcp" } } }

Tools: scan_tokens · analyze_context · check_budget · fix_context · generate_llmignore



GitHub · PyPI · Issues · Changelog · Live Demo

Built for developers who'd rather not pay for noise. · MIT License

⭐ Star the repo if skim saved you some tokens.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skim_llm-0.5.1.tar.gz (112.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skim_llm-0.5.1-py3-none-any.whl (121.5 kB view details)

Uploaded Python 3

File details

Details for the file skim_llm-0.5.1.tar.gz.

File metadata

  • Download URL: skim_llm-0.5.1.tar.gz
  • Upload date:
  • Size: 112.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.5.1.tar.gz
Algorithm Hash digest
SHA256 a7a73ffff130986fd6fdb35e1fb7f6cf2ab75c8b166a056916596405a19bb1c4
MD5 b5db65477f8a49778989f2d7dc214db5
BLAKE2b-256 17bf00c4056944aeb106b2bbafd412a227438c3321e0202ee548a0c52d3d01ed

See more details on using hashes here.

File details

Details for the file skim_llm-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: skim_llm-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 121.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 09a43a7b095693c0f5fd64dd2339081de7cb82c8b96337c8c4014206449eae04
MD5 721580d70b22e1b3426208f7edf788e4
BLAKE2b-256 e429f0d72801bfa500158f468722baf86b8f09a2eacec5121c803a044663896c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page