Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.
Project description
skim
Stop paying for tokens you never meant to send.
The runtime layer that sits between your AI tools and the LLM API — stripping waste, injecting caching, and showing you exactly where every token goes.
⚡ Quickstart · 🔍 How it works · 📊 Dashboard · 🏢 Enterprise · ⌨️ CLI · 📚 Docs · ▶️ Live Demo
[!NOTE] One env var. Zero code changes. Claude Code reads a
package-lock.json— 122k tokens, $0.37 — just to answer a question about a 200-line file. History compounds. Your context window fills silently and quality degrades while you fly blind. skim fixes this in the API call path, in real time.
flowchart LR
A["🤖 Claude Code<br/>Cursor · your app"] -->|ANTHROPIC_BASE_URL| B1
subgraph SKIM ["⚡ skim proxy"]
direction TB
B1["✂️ strip lock files<br/>& build artifacts"]
B2["◈ inject prompt caching<br/>50–90% cheaper"]
B3["🛡️ enforce budgets<br/>hard 429 block"]
B4["📊 live dashboard<br/>+ local SQLite"]
B1 --> B2 --> B3 --> B4
end
B4 --> C["☁️ Anthropic<br/>OpenAI · Gemini"]
style A fill:#161920,stroke:#6c63ff,color:#e4e6f0
style SKIM fill:#0d0f14,stroke:#6c63ff,color:#6c63ff
style C fill:#161920,stroke:#00d4aa,color:#e4e6f0
style B1 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B2 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B3 fill:#161920,stroke:#252a3a,color:#e4e6f0
style B4 fill:#161920,stroke:#252a3a,color:#e4e6f0
⚡ Quickstart
|
1. Install pip install skim-llm
2. Start the proxy skim proxy
Browser opens automatically to your live dashboard. 3. Point your tool at it export ANTHROPIC_API_KEY=sk-ant-... # required for Claude Code
export ANTHROPIC_BASE_URL=http://localhost:7474
|
That's it. Every call now flows through skim.
|
[!TIP] skim auto-detects your plan —
x-api-keyfor API users,Authorization: Bearerfor OAuth clients — and routes each accordingly, with full waste filtering and tracking either way.
[!WARNING] Claude Code on a Pro/Max subscription cannot use a local proxy. Subscription traffic ignores
ANTHROPIC_BASE_URLand routes straight to Anthropic — the proxy will sit on "waiting for calls". To intercept Claude Code, use API-key auth (export ANTHROPIC_API_KEY=sk-ant-…alongsideANTHROPIC_BASE_URL, in the same shell before launchingclaude). skim also works as-is with Cursor, the SDK, and any OpenAI-compatible tool.
🔍 How it works
✂️Waste filtering Detects lock files, build artifacts & generated code inside
|
◈Caching injection Wraps your system prompt + large context with First call caches it. Every call after is free. CLAUDE.md loads at zero cost on calls 2+. |
📊Live dashboard Opens in your browser on start. No login, no setup. Persists to Real-time SSE updates — watch tokens & cost as they happen. |
Auto-detected waste signatures
| File | Detected by |
|---|---|
package-lock.json |
"lockfileVersion" + "resolved": "https://" |
yarn.lock |
# yarn lockfile v1 + resolved |
pnpm-lock.yaml |
lockfileVersion: + resolution: |
Cargo.lock |
@generated + [[package]] |
poetry.lock |
@generated + [[package]] |
composer.lock |
"content-hash": + "packages": |
Plus anything in your project's .llmignore. Stripped blocks are replaced with a one-line note showing what was removed and how to disable it.
How plan detection works
One method, _auth_type(), owns all routing logic:
_auth_type() → ("apikey", key) # API plan → filtering + caching + tracking
→ ("oauth", token) # Pro/Max plan → filtering + tracking (no cache injection)
→ ("", "") # no auth → 401
Adding a new plan type (enterprise SSO, team tokens) is a single elif. Caching injection is skipped for Pro/OAuth because the Pro plan manages its own cache layer.
📊 Dashboard
Five fully-built pages. Dark theme, live charts, real-time SSE updates — no refresh button needed.
| 🟣 Overview | ⚡ Sessions | 📈 Usage | 🤖 Models | 💰 Savings |
|---|---|---|---|---|
| tokens, cost, savings, cache |
full call log, searchable |
hourly + daily charts |
cost/1k, cache %, waste % |
cumulative savings & ROI |
skim proxy # local dashboard, zero setup, opens in browser
The local dashboard works for everyone — solo devs, Pro users, anyone. Data never leaves your machine unless you explicitly connect a team server.
🏢 Enterprise
[!IMPORTANT] Everything below is open-source and self-hosted — same pip package, no paywall, no telemetry.
🛡️ Budget enforcementHard-block calls that exceed token/cost limits. Proxy returns skim admin budget set --owner-type team \
--owner-id engineering --usd 500 --period monthly
🔔 Webhook alertsSlack (& Teams) or any HTTP endpoint on budget events. skim admin webhooks add --channel slack \
--url https://hooks.slack.com/...
✉️ User invitesSelf-registration via single-use links. No manual accounts. skim admin users invite --email new@corp.com \
--role user --team platform
|
🔑 Scoped API keys
👥 RBAC
📋 Audit logEvery sensitive action logged immutably. Queryable by action + date. skim admin audit --days 30 --action auth.login
📤 Data exportCSV event logs + JSON summaries for accounting & BI. skim admin export --days 30 --out report.csv
|
Team deployment in 3 commands
# 1. Run the server (auto-creates admin, uses gunicorn if installed)
pip install 'skim-llm[web]'
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475
# 2. Each developer connects their proxy
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-... # generate in Settings
# 3. Manage from anywhere
skim admin users list
Auth: local password · LDAP/AD (SKIM_LDAP_*) · Google/GitHub/Azure/Okta (SKIM_OIDC_*)
Full guide → docs/enterprise.md · docs/deployment.md
⌨️ CLI Reference
|
🔬 Static analysis no API key skim scan # token cost per file
skim analyze # detect waste patterns
skim fix # auto-write .llmignore
skim check # CI budget gate
skim generate # .llmignore + CLAUDE.md
skim secrets # leaked credential scan
|
⚙️ Runtime & ops skim proxy # the interceptor
skim server # team dashboard + API
skim admin # manage users/budgets/keys
skim audit # local operation log
skim hooks # git pre-commit gate
skim baseline # token regression checks
|
Example — skim fix auto-cleanup
skim fix — ./my-project
──────────────────────────────────────────────────────
Before : 166.8k tokens (83.4% ctx) $0.50/session
Pattern Severity Tokens saved Rules
────────────────────────────────────────────────────
Lock files HIGH 160.3k +7
Test snapshots MEDIUM 4.1k +2
✓ Written to .llmignore
After : 6.5k tokens (3.2% ctx) $0.02/session
Saved : 160.3k tokens (96.1% reduction) $0.48/session
Now : 51 sessions / $1
🐍 Python API
from adapters import ClaudeAdapter
claude = ClaudeAdapter(
model="claude-sonnet-4-6",
system_prompt="You are a terse coding assistant.",
enable_caching=True, # prompt caching, automatic
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037
Adapters: ClaudeAdapter · OpenAIAdapter · GeminiAdapter · OllamaAdapter
📦 Install
pip install skim-llm # core — zero hard deps
pip install 'skim-llm[tiktoken]' # accurate token counting
pip install 'skim-llm[web]' # dashboard server
pip install 'skim-llm[web,sso,ldap]' # enterprise auth
pip install 'skim-llm[all]' # everything
|
📚 Documentation
| Guide | What it covers |
|---|---|
| Quickstart | Zero to running in 2 minutes |
| Proxy | Deep-dive — every feature, every flag |
| Dashboard | Local & team dashboards |
| Enterprise | Budgets, webhooks, invites, RBAC, audit |
| Admin CLI | skim admin complete reference |
| REST API | All 31 endpoints with schemas |
| Configuration | Every env var & .skimrc option |
| Deployment | Docker, systemd, nginx, scaling |
| MCP Setup | Claude Desktop integration |
🔌 MCP Server
{ "mcpServers": { "skim": { "command": "skim-mcp" } } }
Tools: scan_tokens · analyze_context · check_budget · fix_context · generate_llmignore
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skim_llm-0.5.1.tar.gz.
File metadata
- Download URL: skim_llm-0.5.1.tar.gz
- Upload date:
- Size: 112.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7a73ffff130986fd6fdb35e1fb7f6cf2ab75c8b166a056916596405a19bb1c4
|
|
| MD5 |
b5db65477f8a49778989f2d7dc214db5
|
|
| BLAKE2b-256 |
17bf00c4056944aeb106b2bbafd412a227438c3321e0202ee548a0c52d3d01ed
|
File details
Details for the file skim_llm-0.5.1-py3-none-any.whl.
File metadata
- Download URL: skim_llm-0.5.1-py3-none-any.whl
- Upload date:
- Size: 121.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09a43a7b095693c0f5fd64dd2339081de7cb82c8b96337c8c4014206449eae04
|
|
| MD5 |
721580d70b22e1b3426208f7edf788e4
|
|
| BLAKE2b-256 |
e429f0d72801bfa500158f468722baf86b8f09a2eacec5121c803a044663896c
|