Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.
Project description
skim
The runtime layer between your AI tools and the LLM API.
Quickstart · Proxy · Dashboard · CLI · Enterprise · Demo
Most LLM tools waste tokens invisibly. Claude Code reads a package-lock.json (122k tokens, $0.37) before answering a question about a 200-line file. Conversation history compounds quadratically. Your 200k context window fills up silently, quality degrades, and you're flying blind until the model forgets what it was doing.
skim sits in the API call path and fixes this in real-time — without touching any code.
Your tool (Claude Code / Cursor / custom)
│
▼
skim proxy ← one env var activates this
├── strips lock files from tool outputs
├── auto-injects prompt caching (50–90% cheaper)
├── shows live context fill %
└── ships usage data to your team dashboard
│
▼
Anthropic / OpenAI / Gemini API
Quickstart
pip install skim-llm
# Start the proxy
skim proxy --port 7474 --path .
# In your shell (or .zshrc / .bashrc):
export ANTHROPIC_BASE_URL=http://localhost:7474
# That's it. Every Claude Code / Cursor call now goes through skim.
What you'll see in the terminal:
[skim] 14:23:01 call #1 1,247ms
Context ████░░░░░░░░░░░░░░░░░░ 12.4% 24.8k/200k
This call: 24.8k in / 1.2k out stripped 122k waste (package-lock.json)
[skim] 14:23:45 call #2 892ms
Context ███████░░░░░░░░░░░░░░░ 38.1% 76.2k/200k
This call: 51.4k in / 2.1k out cache hit 18.6k tokens free
[skim] 14:24:55 call #4 788ms
Context ████████████████░░░░░░ 78.4% 156.8k/200k
⚠ 78% full — /compact NOW before quality degrades
Proxy — the core
The proxy is what makes skim different from every other LLM cost tool. They scan files. skim intercepts calls.
What it does on every API call
1. Waste filtering
Detects lock files, build artifacts, and generated code inside tool_result blocks (the content Claude Code gets back when it reads a file) and strips them before they enter context. A package-lock.json read that would cost 122k tokens becomes a 12-token note.
2. Prompt caching auto-injection (Anthropic only)
Wraps your system prompt and large context blocks with cache_control: {"type": "ephemeral"} automatically. First call: Anthropic caches the content (25% write fee once). Every subsequent call: that content is free. For Claude Code, the CLAUDE.md + project context loads at zero cost on calls 2+. Real savings: 50–90% on system prompt tokens.
3. Live session health After every call, prints context fill % with a progress bar. Warns at 65%, alerts at 85%. For Claude Code Pro users, this is the visibility you never had.
4. Actual usage tracking
Reads usage.input_tokens from the API response — not estimates. Ships real numbers to ~/.skim/audit.log and optionally to a central team dashboard.
OpenAI-compatible tools
export OPENAI_BASE_URL=http://localhost:7474
Works with anything that uses openai.OpenAI(base_url=...).
Dashboard
For teams, skim includes a web server with login, per-user cost attribution, and budget alerts.
# Install web extras
pip install 'skim-llm[web]'
# Start the server
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475
# Open http://localhost:7475/dashboard
Then connect each developer's proxy to it:
export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-... # generate in Settings
Auth options:
- Local password (default)
- LDAP / Active Directory: set
SKIM_LDAP_URL+SKIM_LDAP_BASE_DN - Google / GitHub / Azure AD / Okta: set
SKIM_OIDC_*env vars
Docker:
docker run -p 7474:7474 -p 7475:7475 \
-e SKIM_ADMIN_EMAIL=you@corp.com \
-v /data/skim:/data \
ghcr.io/bb1nfosec/skim
CLI Reference
skim scan Audit token costs per file
skim analyze Detect waste patterns with severity + auto-fix
skim fix Write .llmignore rules — shows before/after savings
skim check CI budget gate (exits 1 if over threshold)
skim generate Generate .llmignore, .skimrc, CLAUDE.md
skim secrets Scan for leaked credentials (AWS, OpenAI, GitHub PAT...)
skim proxy Runtime interceptor + query optimizer
skim server Web dashboard + REST API
skim audit View operation log (~/.skim/audit.log)
skim config Manage .skimrc configuration
skim hooks Install/remove git pre-commit budget gate
skim baseline Save/compare token count snapshots
Static analysis (no API key needed)
# See what's eating your tokens and what it costs
skim scan --path ./my-project
# Find waste patterns with one-line fixes
skim analyze --path .
# Auto-fix: write .llmignore rules, show before/after
skim fix --path . --min-severity medium
# Fail CI if project exceeds 30% of model context limit
skim check --path . --max-pct 30 --fail-on-waste
Example output — skim fix:
distill fix — ./my-project
──────────────────────────────────────────────────────
Before : 166.8k tokens (83.4% ctx) $0.50/session
Pattern Severity Tokens saved Rules
────────────────────────────────────────────────────
Lock files HIGH 160.3k +7
Test snapshots MEDIUM 4.1k +2
✓ Written to .llmignore
After : 6.5k tokens (3.2% ctx) $0.02/session
Saved : 160.3k tokens (96.1% reduction) $0.48/session
Now : 51 sessions / $1
Secrets scan
# Scan before any LLM touches your codebase
skim secrets --path . --fail # exits 1 if findings exist
Detects: AWS Access Key IDs, OpenAI API keys, Anthropic keys, GitHub PATs, private key blocks, Stripe live keys, Slack tokens, JWTs, and generic secrets/passwords.
Baseline regression (CI)
# Save before a refactor
skim baseline save --name pre-refactor
# Compare after — fails CI if > 5k tokens regressed
skim baseline compare --name pre-refactor
Git hook
# Block commits that push context over budget
skim hooks install --max-pct 30 --fail-on-waste
Enterprise
| Need | Solution |
|---|---|
| Cost attribution by team | skim server dashboard, per-user breakdown |
| Budget enforcement | skim check in CI + git hooks + proxy hard limits |
| SSO / LDAP | SKIM_OIDC_* + SKIM_LDAP_* env vars |
| Audit trail | ~/.skim/audit.log + central server ingestion |
| Self-hosted deployment | Docker image + Helm chart (see deploy/) |
| Secrets governance | skim secrets --fail in pre-commit + CI |
| Regression prevention | skim baseline compare in PR pipelines |
| Air-gapped / Ollama | --model ollama — all analysis local, $0.00 |
Configuration
Create .skimrc in your project root (commit it for team-wide policy):
model = claude # claude | openai | gemini | ollama
max_pct = 30 # fail CI if context exceeds X% of limit
fail_on_waste = false # also fail on HIGH severity patterns
min_severity = high # auto-fix: high | medium | low
audit = false # log every operation to ~/.skim/audit.log
proxy_port = 7474
MCP Server
Exposes skim as Claude Desktop tools (no CLI needed):
{
"mcpServers": {
"skim": { "command": "skim-mcp" }
}
}
Available tools: scan_tokens, analyze_context, check_budget, fix_context, generate_llmignore
Python API
from adapters import ClaudeAdapter
claude = ClaudeAdapter(
model="claude-sonnet-4-5",
system_prompt="You are a terse coding assistant.",
enable_caching=True, # enables prompt caching automatically
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037
Adapters: ClaudeAdapter, OpenAIAdapter, GeminiAdapter, OllamaAdapter
Install
# Core (zero hard deps — scan, analyze, check, fix, proxy)
pip install skim-llm
# With accurate token counting
pip install 'skim-llm[tiktoken]'
# With Claude adapter
pip install 'skim-llm[claude]'
# Web dashboard
pip install 'skim-llm[web]'
# Enterprise (SSO + LDAP)
pip install 'skim-llm[web,sso,ldap]'
# Everything
pip install 'skim-llm[all]'
Demo
Live demo (individual + org/enterprise): https://demo-mu-ten-60.vercel.app
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skim_llm-0.2.0.tar.gz.
File metadata
- Download URL: skim_llm-0.2.0.tar.gz
- Upload date:
- Size: 72.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f14333712833e321b19f8405c39ed3bd8f1ce6f37af901ef6b430e992fdfd56
|
|
| MD5 |
0719a1317001ebf0341adab6f20ab167
|
|
| BLAKE2b-256 |
940722c7f7f44b908d23b453d92c5caaeab618b0cd99bef94d1e9f41b3ec6dc1
|
File details
Details for the file skim_llm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: skim_llm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 82.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fef1f51798adadd1a1798f123685195772b417f4326ecb63440a2481a5ed2d4
|
|
| MD5 |
305be67e3a261d060cb1db65741fb8f4
|
|
| BLAKE2b-256 |
3383a188619968ffc118934548e53ae8bfbf63d54d4b8cad1a416905ea41f972
|