Runtime token proxy + optimization toolkit for LLM developers and enterprises. Intercepts API calls, strips waste in real-time, tracks costs, and serves a web dashboard.

These details have not been verified by PyPI

Project links

Project description

skim

The runtime layer between your AI tools and the LLM API.

Quickstart · Proxy · Dashboard · CLI · Enterprise · Demo

Most LLM tools waste tokens invisibly. Claude Code reads a package-lock.json (122k tokens, $0.37) before answering a question about a 200-line file. Conversation history compounds quadratically. Your 200k context window fills up silently, quality degrades, and you're flying blind until the model forgets what it was doing.

skim sits in the API call path and fixes this in real-time — without touching any code.

Your tool (Claude Code / Cursor / custom)
       │
       ▼
  skim proxy                    ← one env var activates this
  ├── strips lock files from tool outputs
  ├── auto-injects prompt caching (50–90% cheaper)
  ├── shows live context fill %
  └── ships usage data to your team dashboard
       │
       ▼
Anthropic / OpenAI / Gemini API

Quickstart

pip install skim-llm

# Start the proxy
skim proxy --port 7474 --path .

# In your shell (or .zshrc / .bashrc):
export ANTHROPIC_BASE_URL=http://localhost:7474

# That's it. Every Claude Code / Cursor call now goes through skim.

What you'll see in the terminal:

[skim] 14:23:01  call #1  1,247ms
  Context  ████░░░░░░░░░░░░░░░░░░ 12.4%  24.8k/200k
  This call: 24.8k in / 1.2k out  stripped 122k waste (package-lock.json)

[skim] 14:23:45  call #2  892ms
  Context  ███████░░░░░░░░░░░░░░░ 38.1%  76.2k/200k
  This call: 51.4k in / 2.1k out  cache hit 18.6k tokens free

[skim] 14:24:55  call #4  788ms
  Context  ████████████████░░░░░░ 78.4%  156.8k/200k
  ⚠  78% full — /compact NOW before quality degrades

Proxy — the core

The proxy is what makes skim different from every other LLM cost tool. They scan files. skim intercepts calls.

What it does on every API call

1. Waste filtering Detects lock files, build artifacts, and generated code inside tool_result blocks (the content Claude Code gets back when it reads a file) and strips them before they enter context. A package-lock.json read that would cost 122k tokens becomes a 12-token note.

2. Prompt caching auto-injection (Anthropic only) Wraps your system prompt and large context blocks with cache_control: {"type": "ephemeral"} automatically. First call: Anthropic caches the content (25% write fee once). Every subsequent call: that content is free. For Claude Code, the CLAUDE.md + project context loads at zero cost on calls 2+. Real savings: 50–90% on system prompt tokens.

3. Live session health After every call, prints context fill % with a progress bar. Warns at 65%, alerts at 85%. For Claude Code Pro users, this is the visibility you never had.

4. Actual usage tracking Reads usage.input_tokens from the API response — not estimates. Ships real numbers to ~/.skim/audit.log and optionally to a central team dashboard.

OpenAI-compatible tools

export OPENAI_BASE_URL=http://localhost:7474

Works with anything that uses openai.OpenAI(base_url=...).

Dashboard

For teams, skim includes a web server with login, per-user cost attribution, and budget alerts.

# Install web extras
pip install 'skim-llm[web]'

# Start the server
SKIM_ADMIN_EMAIL=you@corp.com skim server --host 0.0.0.0 --port 7475

# Open http://localhost:7475/dashboard

Then connect each developer's proxy to it:

export SKIM_SERVER_URL=https://skim.corp.internal
export SKIM_SERVER_TOKEN=sk-skim-...   # generate in Settings

Auth options:

Local password (default)
LDAP / Active Directory: set SKIM_LDAP_URL + SKIM_LDAP_BASE_DN
Google / GitHub / Azure AD / Okta: set SKIM_OIDC_* env vars

Docker:

docker run -p 7474:7474 -p 7475:7475 \
  -e SKIM_ADMIN_EMAIL=you@corp.com \
  -v /data/skim:/data \
  ghcr.io/bb1nfosec/skim

CLI Reference

skim scan       Audit token costs per file
skim analyze    Detect waste patterns with severity + auto-fix
skim fix        Write .llmignore rules — shows before/after savings
skim check      CI budget gate (exits 1 if over threshold)
skim generate   Generate .llmignore, .skimrc, CLAUDE.md
skim secrets    Scan for leaked credentials (AWS, OpenAI, GitHub PAT...)
skim proxy      Runtime interceptor + query optimizer
skim server     Web dashboard + REST API
skim audit      View operation log (~/.skim/audit.log)
skim config     Manage .skimrc configuration
skim hooks      Install/remove git pre-commit budget gate
skim baseline   Save/compare token count snapshots

Static analysis (no API key needed)

# See what's eating your tokens and what it costs
skim scan --path ./my-project

# Find waste patterns with one-line fixes
skim analyze --path .

# Auto-fix: write .llmignore rules, show before/after
skim fix --path . --min-severity medium

# Fail CI if project exceeds 30% of model context limit
skim check --path . --max-pct 30 --fail-on-waste

Example output — skim fix:

  distill fix  —  ./my-project
  ──────────────────────────────────────────────────────
  Before  : 166.8k tokens  (83.4% ctx)  $0.50/session

  Pattern              Severity    Tokens saved  Rules
  ────────────────────────────────────────────────────
  Lock files           HIGH           160.3k     +7
  Test snapshots       MEDIUM           4.1k     +2

  ✓ Written to .llmignore

  After   : 6.5k tokens  (3.2% ctx)  $0.02/session
  Saved   : 160.3k tokens  (96.1% reduction)  $0.48/session
  Now     : 51 sessions / $1

Secrets scan

# Scan before any LLM touches your codebase
skim secrets --path . --fail    # exits 1 if findings exist

Detects: AWS Access Key IDs, OpenAI API keys, Anthropic keys, GitHub PATs, private key blocks, Stripe live keys, Slack tokens, JWTs, and generic secrets/passwords.

Baseline regression (CI)

# Save before a refactor
skim baseline save --name pre-refactor

# Compare after — fails CI if > 5k tokens regressed
skim baseline compare --name pre-refactor

Git hook

# Block commits that push context over budget
skim hooks install --max-pct 30 --fail-on-waste

Enterprise

Need	Solution
Cost attribution by team	`skim server` dashboard, per-user breakdown
Budget enforcement	`skim check` in CI + git hooks + proxy hard limits
SSO / LDAP	`SKIM_OIDC_` + `SKIM_LDAP_` env vars
Audit trail	`~/.skim/audit.log` + central server ingestion
Self-hosted deployment	Docker image + Helm chart (see `deploy/`)
Secrets governance	`skim secrets --fail` in pre-commit + CI
Regression prevention	`skim baseline compare` in PR pipelines
Air-gapped / Ollama	`--model ollama` — all analysis local, $0.00

Configuration

Create .skimrc in your project root (commit it for team-wide policy):

model         = claude       # claude | openai | gemini | ollama
max_pct       = 30           # fail CI if context exceeds X% of limit
fail_on_waste = false        # also fail on HIGH severity patterns
min_severity  = high         # auto-fix: high | medium | low
audit         = false        # log every operation to ~/.skim/audit.log
proxy_port    = 7474

MCP Server

Exposes skim as Claude Desktop tools (no CLI needed):

{
  "mcpServers": {
    "skim": { "command": "skim-mcp" }
  }
}

Available tools: scan_tokens, analyze_context, check_budget, fix_context, generate_llmignore

Python API

from adapters import ClaudeAdapter

claude = ClaudeAdapter(
    model="claude-sonnet-4-5",
    system_prompt="You are a terse coding assistant.",
    enable_caching=True,   # enables prompt caching automatically
)
response = claude.chat("Refactor the auth module")
claude.print_stats()
# Session: 12,400 tokens | Cache hit rate: 87% | Cost: $0.0037

Adapters: ClaudeAdapter, OpenAIAdapter, GeminiAdapter, OllamaAdapter

Install

# Core (zero hard deps — scan, analyze, check, fix, proxy)
pip install skim-llm

# With accurate token counting
pip install 'skim-llm[tiktoken]'

# With Claude adapter
pip install 'skim-llm[claude]'

# Web dashboard
pip install 'skim-llm[web]'

# Enterprise (SSO + LDAP)
pip install 'skim-llm[web,sso,ldap]'

# Everything
pip install 'skim-llm[all]'

Demo

Live demo (individual + org/enterprise): https://demo-mu-ten-60.vercel.app

MIT License · GitHub · Issues · Changelog

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.1

May 31, 2026

0.5.0

May 31, 2026

0.3.0

May 31, 2026

This version

0.2.0

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skim_llm-0.2.0.tar.gz (72.4 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skim_llm-0.2.0-py3-none-any.whl (82.9 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file skim_llm-0.2.0.tar.gz.

File metadata

Download URL: skim_llm-0.2.0.tar.gz
Upload date: May 31, 2026
Size: 72.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1f14333712833e321b19f8405c39ed3bd8f1ce6f37af901ef6b430e992fdfd56`
MD5	`0719a1317001ebf0341adab6f20ab167`
BLAKE2b-256	`940722c7f7f44b908d23b453d92c5caaeab618b0cd99bef94d1e9f41b3ec6dc1`

See more details on using hashes here.

File details

Details for the file skim_llm-0.2.0-py3-none-any.whl.

File metadata

Download URL: skim_llm-0.2.0-py3-none-any.whl
Upload date: May 31, 2026
Size: 82.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for skim_llm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fef1f51798adadd1a1798f123685195772b417f4326ecb63440a2481a5ed2d4`
MD5	`305be67e3a261d060cb1db65741fb8f4`
BLAKE2b-256	`3383a188619968ffc118934548e53ae8bfbf63d54d4b8cad1a416905ea41f972`

See more details on using hashes here.

skim-llm 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

skim

Quickstart

Proxy — the core

What it does on every API call

OpenAI-compatible tools

Dashboard

CLI Reference

Static analysis (no API key needed)

Secrets scan

Baseline regression (CI)

Git hook

Enterprise

Configuration

MCP Server

Python API

Install

Demo

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes