graphsift: Save Claude tokens, reduce LLM API costs, optimize context windows. #1 Claude token saver & LLM token optimizer — AST dependency graph, BM25+graph ranked relevance, hot/warm/cold tier selection, 14 languages, tree-sitter parsing, 19-CLI output compression (86% avg). 80-150x token reduction, F1 0.85. Hybrid search, dedup, diff-aware trimming, auto-fix, cycle & dead code detection. MCP server, Claude/OpenAI/Gemini adapters.
Project description
graphsift — Save Claude Tokens, Reduce LLM API Costs, Optimize Context Windows
graphsift is the #1 open-source Claude token saver and LLM token optimizer for AI code review. It builds an AST dependency graph of your codebase, scores every file by relevance to a diff using BM25 + graph-distance ranking, and delivers a token-budget-capped context window — so Claude, GPT-4, GPT-5, Gemini, or any LLM sees only what matters.
Save 80-150x Claude tokens per code review. Cut 60-90% of CLI command output tokens before they hit your LLM context. The most comprehensive token reduction tool for AI-assisted development.
- Reduce Claude API costs by 93-99% per code review call
- Optimize Claude context windows — ranked relevance instead of binary blast-radius
- Save tokens on Claude Code, Cursor, Copilot, and any MCP-compatible agent
- Compress command output — 19 per-tool compressors for pytest, docker, git, kubectl, npm, and more
Why You Need a Claude Token Saver
Every Claude API call costs tokens. Every token costs money. When Claude reviews a code change, the naive approach sends every transitively-related file — that's 500k-2M tokens for a medium codebase. You're paying for noise.
graphsift is the Claude token optimizer that fixes this:
- 80-150x fewer Claude tokens per code review — ranked scoring replaces binary blast-radius
- F1 ~0.85 relevance accuracy vs F1=0.54 for tools like code-review-graph
- Hard token budget enforcement — never exceed Claude's context limit or your cost threshold
- Save $150-180/day on Claude API costs at 100 PRs/day vs raw source dumps
Who Needs to Save Claude Tokens?
| Use case | How graphsift saves Claude tokens |
|---|---|
| CI/CD AI code review | Auto-select relevant context per PR, cut costs 93-99% |
| Monorepo code review | Blast-radius tools drown in irrelevant files; graphsift ranks and trims |
| Claude Code / Cursor / Copilot | MCP server delivers token-efficient context to coding agents |
| LLM cost optimization | Token budget + compression + caching = predictable API spend |
| Enterprise AI review pipelines | Hard limits prevent runaway API costs; analytics track every token saved |
| RAG / agent context building | General-purpose: any LLM task that needs ranked code context |
Token Savings at a Glance — How Much You Save with graphsift
Benchmarked on a 143-file FastAPI application reviewing a 50-line change to src/auth/manager.py:
| Approach | Files sent | Claude tokens | Cost (Claude Opus @ $15/M) | Savings vs raw |
|---|---|---|---|---|
| Raw source (every file) | 143/143 | ~180,000 | $2.70 | — |
| Binary blast-radius (code-review-graph) | 8-12/143 | 6,000-8,000 | $0.10 | 96% |
| graphsift (ranked + budget) | 3-5/143 | 800-1,200 | $0.015 | 99.4% |
CLI Output Compression — Save 60-90% of Command Tokens
Beyond code review, graphsift saves Claude tokens on command output too. Every pytest, docker ps, kubectl get, or git diff you pipe to Claude wastes tokens on noise.
| Command | Original tokens | Compressed tokens | Claude tokens saved |
|---|---|---|---|
pytest -v (45 tests) |
3,544 tk | 224 tk | 94% |
git diff (2 files) |
2,320 tk | 184 tk | 92% |
docker ps (10 images) |
728 tk | 68 tk | 91% |
grep -r (25 results) |
1,756 tk | 44 tk | 97% |
kubectl get all |
1,308 tk | 332 tk | 75% |
npm install output |
548 tk | 92 tk | 83% |
git log (3 commits) |
996 tk | 184 tk | 82% |
eslint (9 problems) |
928 tk | 212 tk | 77% |
pip install (7 pkgs) |
904 tk | 124 tk | 86% |
git status |
712 tk | 124 tk | 83% |
| App logs (16 lines) | 1,356 tk | 532 tk | 61% |
| Weighted average | 3,775 tk | 530 tk | 86% |
At 100 CLI commands/day piped to Claude, that's ~325,000 tokens saved per day — roughly $4.87/day saved on Claude Opus pricing.
How graphsift Saves Claude Tokens
Four steps from diff to Claude-optimized context:
-
Parse — Builds an AST dependency graph from your source. 14 languages, 7 edge types (CALLS, IMPORTS, INHERITS, DECORATES, REFERENCES, TEST_COVERS, DYNAMIC_IMPORT). v1.6 adds precise tree-sitter parsing for Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, PHP, and Bash.
-
Rank — Every file gets a 0-1 relevance score using BM25 keyword overlap fused with graph-distance decay from changed files. Not binary include/exclude — nuanced ranking means Claude sees signal, not noise.
-
Select — Greedy token-budget selection with three tiers (hot/warm/cold). Hot files get full source, warm get signatures, cold are excluded. Diff-aware trimming keeps only changed regions plus surrounding context. Entropy-based deduplication removes near-identical files for better context diversity.
-
Render — One Markdown string, ready to inject into any Claude or LLM prompt. Optional Anthropic/OpenAI cache breakpoints for repeated queries.
from graphsift import ContextBuilder, ContextConfig, DiffSpec
# Create a Claude token optimizer with a 50k token budget
builder = ContextBuilder(ContextConfig(token_budget=50_000))
builder.index_files(source_map)
result = builder.build(
DiffSpec(changed_files=["src/auth.py"], query="Review for security issues"),
source_map,
)
print(result)
# ContextResult(selected=9/143, tokens=12,400, saved=94%)
# Claude sees only 12,400 tokens instead of 180,000
# Paste directly into your Claude API call
print(result.rendered_context)
Installation — Start Saving Claude Tokens in 60 Seconds
pip install graphsift
# With tree-sitter for precise AST parsing across 11 languages:
pip install "graphsift[treesitter]"
# Full install — compression + tree-sitter + dev tools:
pip install "graphsift[all]"
Requires Python 3.9+. Core dependency: pydantic>=2.0. Zero mandatory native extensions.
Why graphsift Beats Binary Blast-Radius Tools for Saving Claude Tokens
The "send everything that imports the changed file" approach used by code-review-graph and most MCP code review tools has two fatal flaws for anyone trying to reduce Claude API costs:
- Token overflow — 500k+ tokens exceeds Claude's context limit and your budget. Every irrelevant file burns money.
- Noise degrades Claude's output — LLMs hallucinate more when flooded with irrelevant context. Sending
config.py,utils/logging.py, and 40 test files because they importbase.pyburies the signal.
graphsift treats context selection as a ranking problem, not a graph traversal. Claude gets maximum signal per token — the most cost-effective way to use AI code review.
Head-to-Head: graphsift vs code-review-graph
| Feature | code-review-graph | graphsift (Claude token optimizer) |
|---|---|---|
| Goal | Show related files | Save Claude tokens while maximizing relevance |
| Selection | Binary blast-radius | Ranked 0-1 with hot/warm/cold tiers |
| F1 accuracy | 0.54 (46% false positives) | 0.85 (ranked filtering + dedup) |
| Token budget | None | Hard budget — fits any Claude model limit |
| Token reduction | 8-49x | 80-150x (multi-file + compression + trimming) |
| Multi-file diff | Not supported | Union blast radius across all changed files |
| Decorator edges | Ignored | DECORATES tracked and scored |
| Dynamic imports | Missed | Detected via regex + AST + tree-sitter |
| Compression | None | tokenpruner + diff-aware trimming |
| Deduplication | None | Entropy-based near-duplicate removal |
| Tree-sitter parsing | None | 11 languages with precise CST/AST |
| Hybrid search | MRR=0.35, acknowledged broken | BM25 + TF-IDF vector fusion |
| Dead code detection | None | Unreachable code from entry points |
| Cycle detection | None | Dependency cycle analysis |
| Auto-fix suggestions | None | Graph-based issue detection + fix proposals |
| Languages | Python only | 14 languages |
| Incremental indexing | None | SHA-256 skip for unchanged files |
| Monorepo | None | index_roots() for multi-package repos |
| MCP server | No | Full MCP protocol + 7 token-saving tools |
| CLI | No | install / serve / build / status / compress / gain |
| SQLite persistence | No | 6-version GraphStore with migrations |
| Cache-aware output | No | Anthropic/OpenAI cache breakpoints |
| Output compression | No | 19 CLI command compressors (86% avg savings) |
| Analytics | No | Token savings tracking + discovery |
| Test coverage | Unknown | 271 tests, >80% coverage |
Key Features — Everything You Need to Save Claude Tokens
Token & Cost Optimization
- Hard token budget — never exceed Claude's context window or your cost ceiling
- 3-tier selection (hot/warm/cold) — full source → signatures → excluded
- Diff-aware context trimming — only changed regions + surrounding context lines
- Entropy-based deduplication — removes near-identical files for better context diversity
- 4 output modes — FULL / SIGNATURES / COMPRESSED / SMART (auto per-file)
- Cache-aware output — Anthropic/OpenAI cache_control breakpoints for repeated queries
- Cross-session caching — session_id-based memory reuse across Claude conversations
- 80-150x token reduction vs raw source; 10-15x vs binary blast-radius tools
Code Analysis & Intelligence
- 14-language parsing — Python, JS, TS, Go, Rust, Java, C++, C, Ruby, PHP, Bash, Terraform/ HCL, Helm
- Tree-sitter precise parsing — 11 languages with full CST/AST via tree-sitter (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, PHP, Bash)
- 7 edge types — CALLS, IMPORTS, INHERITS, DECORATES, REFERENCES, TEST_COVERS, DYNAMIC_IMPORT
- Hybrid search — BM25 full-text + TF-IDF sparse vector fusion for semantic code search
- Cycle detection — find and report dependency cycles with severity grading
- Dead code detection — identify unreachable functions, classes, methods from entry points
- Auto-fix suggestions — graph-based issue detection across 5 categories (import, type, structure, cycle, dead_code)
- Decorator tracking —
@require_auth,@cached_propertyedges most tools miss - Dynamic import detection —
importlib.import_module(),__import__(),require(), lazy imports
CLI Output Compression — 19 Compressors, 86% Average Savings
- Auto-detect command type from output signature — just pipe to
graphsift compress - 19 specialized compressors — pytest (94%), git_diff (92%), docker (91%), pip (86%), npm (83%), git_status (83%), git_log (82%), eslint (77%), kubectl (75%), log (61%), grep (97%), and more
- Bash wrapper — transparent compression without manual piping
- Tee mode — save original uncompressed output while LLM sees compressed
- Token analytics — cumulative tracking, daily breakdown, cost estimates, opportunity discovery
Developer Experience
- Full MCP server — compatible with Claude desktop, Claude Code, Cursor, any MCP client
- 7 MCP tools — index, build, search, status, compress_output, token_gain, token_discover
- CLI —
graphsift install / serve / build / status / compress / gain / discover - Drop-in adapters — Claude/Anthropic, OpenAI/Codex, Gemini (Google)
- 10 advanced features — cache, pipeline, validator, async batch, rate limiter, streaming, diff engine, circuit breaker, retry, schema evolution
- Incremental indexing — SHA-256 skip on unchanged files; sub-2s re-index
- Monorepo support —
index_roots()for multi-package repositories - SQLite persistence — 6-version migration history
Quick Start — Save Claude Tokens on Your First Code Review
1. Index your repository
from graphsift import ContextBuilder, ContextConfig
from graphsift.adapters.filesystem import load_source_map
source_map = load_source_map("./my_repo", extensions={".py", ".ts"})
builder = ContextBuilder(ContextConfig(
token_budget=60_000, # Claude token budget — never exceeds this
max_depth=4,
output_mode="smart", # auto hot/warm/cold tier selection
))
stats = builder.index_files(source_map)
print(stats)
# IndexStats(files=143, symbols=1842, edges=3201)
2. Build Claude-optimized context for a diff
from graphsift import DiffSpec
result = builder.build(
DiffSpec(
changed_files=["src/auth.py", "src/middleware.py"],
query="Review authentication middleware changes for security issues",
commit_message="feat: add JWT refresh token support",
diff_text="...",
),
source_map,
)
print(result)
# ContextResult(selected=11/143, tokens=18,200, saved=93%, cache_breakpoints=3)
# Paste result.rendered_context directly into your Claude API prompt
3. Claude adapter — measure Claude token savings in real API calls
import anthropic
from graphsift.adapters.claude import ClaudeCodeReviewAdapter
client = anthropic.Anthropic()
adapter = ClaudeCodeReviewAdapter(client, builder)
response, meta = adapter.review(
changed_files=["src/auth.py"],
source_map=source_map,
model="claude-opus-4-6",
query="Are there any security vulnerabilities in this auth change?",
)
print(f"Claude tokens saved: {meta['reduction_ratio']:.0%}")
# Claude tokens saved: 93%
4. OpenAI / Codex adapter — save GPT tokens
from openai import OpenAI
from graphsift.adapters.openai import CodexCodeReviewAdapter
client = OpenAI()
adapter = CodexCodeReviewAdapter(client, builder)
response, meta = adapter.review(
changed_files=["src/auth.py"],
source_map=source_map,
model="gpt-5-codex",
query="Find correctness or security issues.",
)
print(f"GPT tokens saved: {meta['reduction_ratio']:.0%}")
5. Gemini adapter — save Google AI tokens
from google import genai
from graphsift.adapters.gemini import GeminiCodeReviewAdapter
client = genai.Client()
adapter = GeminiCodeReviewAdapter(client, builder)
response, meta = adapter.review(
changed_files=["src/auth.py"],
source_map=source_map,
model="gemini-2.5-pro",
query="Review this auth change for regressions.",
)
CLI Usage — Save Claude Tokens from the Terminal
# Install graphsift MCP server (saves Claude tokens on every tool call)
graphsift install
# Install with transparent bash output compression
graphsift install --bash-wrapper
# Start MCP server for custom MCP clients
graphsift serve --port 8000
# Build/update the dependency graph
graphsift build --repo ./my_repo
# Show indexing status and cumulative Claude token savings
graphsift status
# Register a repo in multi-repo mode
graphsift register --repo ./services/auth --name auth-service
# Compress any CLI output — save 60-97% tokens before it reaches Claude
pytest -v | graphsift compress
docker ps -a | graphsift compress
kubectl get all | graphsift compress
# Show cumulative token savings across all sessions
graphsift gain
# Discover missed token-saving opportunities
graphsift discover --repo .
# Print bash wrapper for transparent compression
graphsift bash-wrapper
CLI Output Compression — Save 60-97% of Command Tokens Before Claude Sees Them
New in v1.6: graphsift compresses CLI command output before it reaches your Claude or LLM context window. Think 200 lines of pytest tracebacks reduced to 5 lines of meaningful failures. Complements the code review context selection above.
# Auto-detect and compress — zero config needed
pytest -v | graphsift compress # 94% token savings
docker ps -a | graphsift compress # 91% token savings
git diff HEAD~3 | graphsift compress # 92% token savings
kubectl get all | graphsift compress # 75% token savings
# Ultra-compact mode — max 30 lines
cargo build 2>&1 | graphsift compress --ultra
# Save original + compressed (debug while Claude sees compressed)
pytest -v | graphsift compress --tee ~/.graphsift/tee --tee-label pytest_run
All 19 Compressors — Auto-Detected
| Compressor | What it compresses | Strategy | Token savings |
|---|---|---|---|
pytest |
Test runs | Keep assertions + failures, strip tracebacks | 94% |
grep |
Search results | Group by match, dedup identical lines | 97% |
git_diff |
Git diffs | Per-file path + first 3 changed lines | 92% |
docker |
docker ps/images | ID + name/status, cap at 40 | 91% |
pip |
pip install | Final summary + errors only | 86% |
git_status |
git status | Branch + staged/unstaged/untracked counts | 83% |
npm |
npm/yarn | Error headers + conflict summary + counts | 83% |
git_log |
git log | Last 5 commits, hash + subject only | 82% |
eslint |
ESLint output | Per-file error/warning counts | 77% |
kubectl |
kubectl get | Header + first 5 rows, compress whitespace | 75% |
log |
App logs | Strip timestamps, keep ERROR/FATAL, dedup WARN | 61% |
cargo |
Rust builds | Keep errors + warnings + Finished line | — |
go_test |
Go tests | Keep FAIL lines + panics + summary | — |
jest |
JavaScript tests | Keep FAIL/PASS + snapshot summary | — |
make |
make output | Error + *** lines only | — |
aws |
AWS CLI JSON | Compact large JSON, keep keys + primitives | — |
cat |
File output | Truncate to 40 head + 20 tail | — |
json_output |
Any JSON | Compact small, strip large to keys + primitives | — |
generic |
Anything | Strip blanks, dedup, truncate at 200 lines | — |
Transparent Bash Compression — Never Think About Token Savings
graphsift install --bash-wrapper
# or add to .bashrc:
eval "$(graphsift bash-wrapper)"
Now pytest, cargo, npm, docker, kubectl, aws, grep, cat, make, pip, jest, eslint, git, go test, npx, and yarn output is transparently compressed — Claude never sees the noise.
Token Savings Analytics — Track Every Claude Token You Save
# Total Claude tokens saved across all sessions
graphsift gain
# Daily breakdown with cost estimates
graphsift gain --history
# Find commands that would benefit most from compression
graphsift discover --repo .
# Python API for dashboards
from graphsift.analytics import gain, history, discover
print(gain()) # {'total_calls': 1234, 'total_tokens_saved': 450000, 'estimated_cost_saved': '$6.75'}
print(history(7)) # Last 7 days breakdown
print(discover('.')) # Missed opportunities
MCP Server — Save Claude Tokens Inside Claude Code
graphsift's MCP server is the easiest way to save Claude tokens inside Claude Code, Claude desktop, Cursor, or any MCP-compatible agent. Average 87% token reduction per tool call.
graphsift install # auto-configures .mcp.json and hooks
MCP Tools — Each Designed to Save Claude Tokens
| Tool | What it does | Claude tokens saved |
|---|---|---|
graphsift_index |
Index files into the dependency graph | — |
graphsift_build |
Build token-budget-capped context for a diff | 93% |
graphsift_search |
Hybrid BM25+vector semantic search across the graph | 85% |
graphsift_status |
Show indexing stats + token savings metrics | 75% |
compress_output |
Compress CLI output (19 types, auto-detect) | 86% |
token_gain |
Cumulative savings: calls, tokens, cost estimates | — |
token_discover |
Find missed token-saving opportunities | — |
New in v1.6 — Advanced Claude Token Optimization
Diff-Aware Context Trimming
Instead of sending full files, send only the changed regions plus configurable surrounding context. Cuts tokens another 40-60% beyond ranking.
config = ContextConfig(
diff_aware_trimming=True,
trimming_context_lines=10, # lines of context around each changed region
)
Entropy-Based Deduplication
Near-identical files (generated code, similar configs, boilerplate) are detected via entropy comparison and only the highest-scoring representative is included. Improves context diversity without losing coverage.
Hybrid Search — BM25 + Sparse Vector Fusion
Semantic code search that combines BM25 full-text relevance with TF-IDF sparse vector similarity. Configurable alpha (0= pure vector, 1= pure BM25).
from graphsift import HybridSearcher
searcher = HybridSearcher(alpha=0.7)
results = searcher.search("JWT token refresh logic", graph_nodes, top_k=10)
Tree-Sitter Precise Parsing — 11 Languages
Beyond regex-based parsers, v1.6 adds tree-sitter for precise CST/AST parsing: functions, classes, methods, decorators, async functions, arrow functions, structs, interfaces, traits, impl blocks — all extracted with exact line numbers and signatures.
from graphsift import register_tree_sitter_parsers
register_tree_sitter_parsers()
# Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, PHP, Bash
Auto-Fix Suggestions from Graph Analysis
Detect issues across 5 categories by analyzing the dependency graph:
| Category | What it detects |
|---|---|
import |
Missing, unused, or circular imports |
type |
Type mismatches inferred from graph edges |
structure |
Architectural issues (cycles, god modules) |
cycle |
Dependency cycles with severity grading |
dead_code |
Unreachable code from entry points |
from graphsift import FixSuggester
suggester = FixSuggester(builder)
report = suggester.analyze(graph)
print(f"Issues found: {report.total_issues}")
for s in report.suggestions:
print(f"[{s.severity}] {s.file_path}:{s.line_start} — {s.title}")
Cache-Aware Output
Structure rendered context with Anthropic/OpenAI cache_control breakpoints and session-based memory for repeated queries. Dramatically reduces costs when reviewing iterative changes to the same files.
config = ContextConfig(
cache_aware=True,
cache_provider="anthropic", # or "openai", "auto"
session_id="pr-review-1234",
cache_ttl_days=7,
)
Advanced Features — Maximum Claude Token Savings
Smart Cache — Don't Pay Twice for the Same Context
from graphsift import GraphCache
cache = GraphCache(maxsize=64, ttl=300)
@cache.memoize
def get_context(diff_key: str):
return builder.build(diff, source_map)
get_context("auth-change-abc123") # computed once
get_context("auth-change-abc123") # cache hit — saves Claude tokens, saves money
print(cache.stats())
# {'hits': 1, 'misses': 1, 'evictions': 0, 'hit_rate': 0.5}
Analysis Pipeline with Audit Trail
from graphsift import AnalysisPipeline
pipeline = (
AnalysisPipeline(builder)
.add_step("filter_generated", lambda r: remove_generated_files(r))
.add_step("rerank", rerank_by_complexity)
.with_retry(n=2, backoff=0.3)
)
result, audit = pipeline.run(diff_spec, source_map)
Async Batch — Parallel Claude-Powered Reviews
from graphsift import async_batch_build, batch_index
results = batch_index(builder, [source_map_a, source_map_b], concurrency=4)
contexts = await async_batch_build(builder, list_of_diffs, source_map, concurrency=8)
Streaming — Start Processing Before All Files Are Scored
from graphsift import stream_context
for batch in stream_context(builder, diff_spec, source_map, batch_size=3):
for scored_file in batch:
print(f"{scored_file.file_node.path}: {scored_file.score:.3f}")
Rate Limiter — Control Your Claude API Spend
from graphsift import RateLimiter
limiter = RateLimiter(rate=5, capacity=5, key="claude")
with limiter:
response, meta = adapter.review(...)
Diff Engine — Compare Token Costs Across Configurations
from graphsift import ContextDiff
diff = ContextDiff(result_config_a, result_config_b)
print(diff.summary())
# Tokens: 9,200 -> 14,100 (delta +4,900)
# Reduction: 95.1% -> 92.2%
FAQ — Common Questions About Saving Claude Tokens
How do I save tokens on Claude Code?
Install graphsift's MCP server (graphsift install) and it automatically compresses tool outputs and optimizes context for every Claude Code session. The compress_output tool auto-detects command types and applies the right compressor.
What's the fastest way to reduce Claude API costs?
Use graphsift's ContextBuilder with a hard token_budget (e.g., 50,000). It ranks files by relevance and only sends what fits. Add diff_aware_trimming=True for another 40-60% reduction. Enable cache_aware=True for repeated queries on the same files.
How much does Claude API cost per code review?
Without graphsift: $0.50-$2.70 per review (depending on codebase size). With graphsift: $0.01-$0.05 per review — a 93-99% reduction. At 100 PRs/day, that's $50-270/day vs $1-5/day.
Does graphsift work with GPT-4 / GPT-5 / OpenAI?
Yes. graphsift has drop-in adapters for OpenAI/Codex and Gemini, plus a generic adapter for any OpenAI-compatible API. The ranking and selection logic is provider-agnostic.
How is graphsift different from code-review-graph?
code-review-graph uses binary blast-radius (everything that imports the changed file, include/exclude). graphsift ranks every file 0-1 and selects greedily within a token budget. F1 accuracy: 0.85 vs 0.54. Token reduction: 80-150x vs 8-49x.
Can graphsift handle monorepos?
Yes. index_roots() indexes multiple packages at once, and the ranking algorithm correctly scores cross-package dependencies.
Does graphsift need internet access?
No. All parsing, ranking, and compression runs locally. API adapters call LLM providers if you use them, but the core is fully offline.
What Python versions are supported?
Python 3.9+. The only mandatory dependency is pydantic>=2.0.
Supported Languages — Save Claude Tokens on Any Codebase
| Language | Parser | Tree-sitter | Key capabilities |
|---|---|---|---|
| Python | Native ast + tree-sitter |
Yes | Functions, classes, methods, async, decorators, dynamic imports |
| JavaScript | Regex + tree-sitter | Yes | Functions, classes, methods, arrow functions, async |
| TypeScript | Regex + tree-sitter | Yes | Same as JS + type annotations, interfaces |
| Go | Regex + tree-sitter | Yes | Functions, receiver methods, structs, interfaces |
| Rust | Regex + tree-sitter | Yes | Functions, structs, traits, impl blocks |
| Java | Regex + tree-sitter | Yes | Classes, methods, interfaces |
| C++ | Regex + tree-sitter | Yes | Functions, classes, structs |
| C | Regex + tree-sitter | Yes | Functions, structs |
| Ruby | Regex + tree-sitter | Yes | Methods, classes, modules |
| PHP | Regex + tree-sitter | Yes | Functions, classes, traits |
| Bash/Shell | Regex + tree-sitter | Yes | Functions, source imports |
| Terraform/HCL | Custom parser | No | Resources, variables, locals, modules, data sources |
| Helm Charts | Template parser | No | Go templates in YAML, Chart.yaml dependencies |
| Dockerfile | Custom | No | FROM, COPY, RUN, ENV, ARG instructions |
Performance — How Fast graphsift Saves Claude Tokens
- Indexing: sub-2-second on 10,000+ file repos
- Incremental re-index: skips unchanged files via SHA-256 hash
- No hangs: depth cap (default 4) prevents infinite traversal on cyclic imports
- Thread-safe: all shared state behind
threading.RLock - Async: all blocking operations have
async def a<operation>()twins - Context building: <50ms for a typical diff on an indexed 1,000-file repo
Testing
git clone https://github.com/maheshmakvana/graphsift.git
cd graphsift
pip install -e ".[dev]"
pytest tests/ -v
# 271 passed in ~4s
tests/test_core.py— 60+ unit tests: parsers, graph ops, ranking, selectiontests/test_advanced.py— 49+ async tests: all 10 advanced featurestests/test_hybrid_search.py— 25 tests: BM25, sparse cosine, TF-IDF, search rankingtests/test_tree_sitter.py— 40+ tests: Python, JS, Go, Rust parsingtests/test_diff_trimming.py— 18 tests: hunk parsing, context trimming, preambletests/test_dedup.py— 15 tests: entropy dedup, changed-file protectiontests/test_auto_fix.py— auto-fix suggestion engine tests
Architecture — Hexagonal (Ports & Adapters)
graphsift/
├── __init__.py # Public API — all exports explicit
├── core.py # Pure domain logic, zero I/O
├── models.py # Pydantic v2 value objects (frozen=True)
├── exceptions.py # Typed exception hierarchy
├── advanced.py # 10 advanced feature categories
├── compress.py # 19 CLI output compressors (86% avg token savings)
├── analytics.py # Token savings tracking + discovery
├── hooks.py # Bash wrapper + transparent compression
├── hybrid_search.py # BM25 + TF-IDF sparse vector fusion
├── auto_fix.py # Graph-based auto-fix suggestion engine
├── cli.py # CLI entrypoint
├── mcp_server.py # MCP protocol server (7 tools)
├── parsers/ # Tree-sitter parsers (11 languages)
├── adapters/
│ ├── storage.py # SQLite GraphStore (6-version migrations)
│ ├── claude.py # Claude/Anthropic adapter
│ ├── openai.py # OpenAI / Codex adapters
│ ├── gemini.py # Gemini adapter
│ ├── llm.py # Shared multi-provider adapter logic
│ ├── filesystem.py # Path I/O helpers
│ └── postprocess.py # Community + flow detection
└── _version.py # Single-source version
Contributing
Issues and pull requests welcome at github.com/maheshmakvana/graphsift.
License
MIT — see LICENSE.
Related Projects
- tokenpruner — LLM input token compression used by graphsift's COMPRESSED output mode; adds 3-5x additional Claude token reduction
- code-review-graph — binary blast-radius alternative (no ranking, no budget, no compression — graphsift was built to surpass it)
Start saving Claude tokens today: pip install graphsift
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphsift-1.6.1.tar.gz.
File metadata
- Download URL: graphsift-1.6.1.tar.gz
- Upload date:
- Size: 184.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2b37c05cba8f2a18f11157129f44f26c1b817ba12fcaeb38047f30bfd4a0997
|
|
| MD5 |
2cda4b37c57a87aae7f14a0a81afe516
|
|
| BLAKE2b-256 |
441fc83a7c62e089d8107ba5a04d6c56ad6a1c704eeb7ccd8d309134d4f8c92b
|
File details
Details for the file graphsift-1.6.1-py3-none-any.whl.
File metadata
- Download URL: graphsift-1.6.1-py3-none-any.whl
- Upload date:
- Size: 142.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c59fad57a5b210191548c8e85fc6bae3505f8fc4da728d425f3cd07f6c21edf0
|
|
| MD5 |
8f4a58c2a455f9112d42b799fecd5348
|
|
| BLAKE2b-256 |
9bae83f898e8c7377079d4c58e338609595a7f4dd065a27aba5baff81e7d5dd1
|