Skip to main content

Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.

Project description

skim

The runtime layer between your AI tools and the LLM API.

PyPI Python 3.10+ License: MIT Zero hard deps

Quickstart · Proxy · Dashboard · CLI · Enterprise · Demo


Most LLM tools waste tokens invisibly. Claude Code reads a package-lock.json (122k tokens, $0.37) before answering a question about a 200-line file. Conversation history compounds quadratically. Your 200k context window fills up silently, quality degrades, and you're flying blind until the model forgets what it was doing.

skim sits in the API call path and fixes this in real-time — without touching any code.

Your tool (Claude Code / Cursor / custom)
       │
       ▼
  skim proxy                    ← one env var activates this
  ├── strips lock files from tool outputs
  ├── auto-injects prompt caching (50–90% cheaper)
  ├── shows live context fill %
  └── ships usage data to your team dashboard
       │
       ▼
Anthropic / OpenAI / Gemini API

Quickstart

pip install skim-llm

# Start the proxy
skim proxy --port 7474 --path .

# In your shell (or .zshrc / .bashrc):
export ANTHROPIC_BASE_URL=http://localhost:7474

# That's it. Every Claude Code / Cursor call now goes through skim.

What you'll see in the terminal:

[skim] 14:23:01  call #1  1,247ms
  Context  ████░░░░░░░░░░░░░░░░░░ 12.4%  24.8k/200k
  This call: 24.8k in / 1.2k out  stripped 122k waste (package-lock.json)

[skim] 14:23:45  call #2  892ms
  Context  ███████░░░░░░░░░░░░░░░ 38.1%  76.2k/200k
  This call: 51.4k in / 2.1k out  cache hit 18.6k tokens free

[skim] 14:24:55  call #4  788ms
  Context  ████████████████░░░░░░ 78.4%  156.8k/200k
  ⚠  78% full — /compact NOW before quality degrades

Proxy — the core

The proxy is what makes skim different from every other LLM cost tool. They scan files. skim intercepts calls.

What it does on every API call

1. Waste filtering Detects lock files, build artifacts, and generated code inside tool_result blocks (the content Claude Code gets back when it reads a file) and strips them before they enter context. A package-lock.json read that would cost 122k tokens becomes a 12-token note.

2. Prompt caching auto-injection (Anthropic only) Wraps your system prompt and large context blocks with cache_control: {"type": "ephemeral"} automatically. First call: Anthropic caches the content (25% write fee once). Every subsequent call: that content is free. For Claude Code, the CLAUDE.md + project context loads at zero cost on calls 2+. Real savings: 50–90% on system prompt tokens.

3. Live session health After every call, prints context fill % with a progress bar. Warns at 65%, alerts at 85%. For Claude Code Pro users, this is the visibility you never had.

4. Actual usage tracking Reads usage.input_tokens from the API response — not estimates. Ships real numbers to ~/.skim/audit.log and optionally to a central team dashboard.

OpenAI-compatible tools

export OPENAI_BASE_URL=http://localhost:7474

Works with anything that uses openai.OpenAI(base_url=...).


Dashboard

For teams, skim includes a web server with login, per-user cost attribution, and budget alerts.

# Install web extras
pip install 'skim-llm[web]'

# Start the server
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475

# Open http://localhost:7475/dashboard

Then connect each developer's proxy to it:

export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-...   # generate in Settings

Auth options:

  • Local password (default)
  • LDAP / Active Directory: set SKIM_LDAP_URL + SKIM_LDAP_BASE_DN
  • Google / GitHub / Azure AD / Okta: set SKIM_OIDC_* env vars

Docker:

docker run -p 7474:7474 -p 7475:7475 \
  -e SKIM_ADMIN_EMAIL=you@corp.com \
  -v /data/skim:/data \
  ghcr.io/bb1nfosec/skim

CLI Reference

skim scan       Audit token costs per file
skim analyze    Detect waste patterns with severity + auto-fix
skim fix        Write .llmignore rules — shows before/after savings
skim check      CI budget gate (exits 1 if over threshold)
skim generate   Generate .llmignore, .skimrc, CLAUDE.md
skim secrets    Scan for leaked credentials (AWS, OpenAI, GitHub PAT...)
skim proxy      Runtime interceptor + query optimizer
skim server     Web dashboard + REST API
skim audit      View operation log (~/.skim/audit.log)
skim config     Manage .skimrc configuration
skim hooks      Install/remove git pre-commit budget gate
skim baseline   Save/compare token count snapshots

Static analysis (no API key needed)

# See what's eating your tokens and what it costs
skim scan --path ./my-project

# Find waste patterns with one-line fixes
skim analyze --path .

# Auto-fix: write .llmignore rules, show before/after
skim fix --path . --min-severity medium

# Fail CI if project exceeds 30% of model context limit
skim check --path . --max-pct 30 --fail-on-waste

Example output — skim fix:

  distill fix  —  ./my-project
  ──────────────────────────────────────────────────────
  Before  : 166.8k tokens  (83.4% ctx)  $0.50/session

  Pattern              Severity    Tokens saved  Rules
  ────────────────────────────────────────────────────
  Lock files           HIGH           160.3k     +7
  Test snapshots       MEDIUM           4.1k     +2

  ✓ Written to .llmignore

  After   : 6.5k tokens  (3.2% ctx)  $0.02/session
  Saved   : 160.3k tokens  (96.1% reduction)  $0.48/session
  Now     : 51 sessions / $1

Secrets scan

# Scan before any LLM touches your codebase
skim secrets --path . --fail    # exits 1 if findings exist

Detects: AWS Access Key IDs, OpenAI API keys, Anthropic keys, GitHub PATs, private key blocks, Stripe live keys, Slack tokens, JWTs, and generic secrets/passwords.

Baseline regression (CI)

# Save before a refactor
skim baseline save --name pre-refactor

# Compare after — fails CI if > 5k tokens regressed
skim baseline compare --name pre-refactor

Git hook

# Block commits that push context over budget
skim hooks install --max-pct 30 --fail-on-waste

Enterprise

Need Solution
Cost attribution by team skim server dashboard, per-user breakdown
Budget enforcement skim check in CI + git hooks + proxy hard limits
SSO / LDAP SKIM_OIDC_* + SKIM_LDAP_* env vars
Audit trail ~/.skim/audit.log + central server ingestion
Self-hosted deployment Docker image + Helm chart (see deploy/)
Secrets governance skim secrets --fail in pre-commit + CI
Regression prevention skim baseline compare in PR pipelines
Air-gapped / Ollama --model ollama — all analysis local, $0.00

Configuration

Create .skimrc in your project root (commit it for team-wide policy):

model         = claude       # claude | openai | gemini | ollama
max_pct       = 30           # fail CI if context exceeds X% of limit
fail_on_waste = false        # also fail on HIGH severity patterns
min_severity  = high         # auto-fix: high | medium | low
audit         = false        # log every operation to ~/.skim/audit.log
proxy_port    = 7474

MCP Server

Exposes skim as Claude Desktop tools (no CLI needed):

{
  "mcpServers": {
    "skim": { "command": "skim-mcp" }
  }
}

Available tools: scan_tokens, analyze_context, check_budget, fix_context, generate_llmignore


Python API

from adapters import ClaudeAdapter

claude = ClaudeAdapter(
    model="claude-sonnet-4-5",
    system_prompt="You are a terse coding assistant.",
    enable_caching=True,   # enables prompt caching automatically
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037

Adapters: ClaudeAdapter, OpenAIAdapter, GeminiAdapter, OllamaAdapter


Install

# Core (zero hard deps — scan, analyze, check, fix, proxy)
pip install skim-llm

# With accurate token counting
pip install 'skim-llm[tiktoken]'

# With Claude adapter
pip install 'skim-llm[claude]'

# Web dashboard
pip install 'skim-llm[web]'

# Enterprise (SSO + LDAP)
pip install 'skim-llm[web,sso,ldap]'

# Everything
pip install 'skim-llm[all]'

Demo

Live demo (individual + org/enterprise): https://demo-mu-ten-60.vercel.app


MIT License · GitHub · Issues · Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skim_llm-0.2.0.tar.gz (72.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skim_llm-0.2.0-py3-none-any.whl (82.9 kB view details)

Uploaded Python 3

File details

Details for the file skim_llm-0.2.0.tar.gz.

File metadata

  • Download URL: skim_llm-0.2.0.tar.gz
  • Upload date:
  • Size: 72.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1f14333712833e321b19f8405c39ed3bd8f1ce6f37af901ef6b430e992fdfd56
MD5 0719a1317001ebf0341adab6f20ab167
BLAKE2b-256 940722c7f7f44b908d23b453d92c5caaeab618b0cd99bef94d1e9f41b3ec6dc1

See more details on using hashes here.

File details

Details for the file skim_llm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: skim_llm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 82.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fef1f51798adadd1a1798f123685195772b417f4326ecb63440a2481a5ed2d4
MD5 305be67e3a261d060cb1db65741fb8f4
BLAKE2b-256 3383a188619968ffc118934548e53ae8bfbf63d54d4b8cad1a416905ea41f972

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page