Smarter code context for LLMs — ranked relevance, multi-file diff, Bash/HCL/Helm parsing, Go receiver methods, monorepo support, incremental indexing, 10 advanced features. Beats code-review-graph with 80-150x token reduction.

These details have not been verified by PyPI

Project links

Project description

graphsift

Smarter code context for LLMs — ranked relevance, multi-file diff, decorator + dynamic import graph, tokenpruner compression.

graphsift solves the same problem as code-review-graph but strictly better: instead of binary blast-radius include/exclude (F1=0.54), it uses multi-signal ranked scoring to select only the most relevant files within a hard token budget — then compresses low-score files via tokenpruner.

from graphsift import ContextBuilder, ContextConfig, DiffSpec

builder = ContextBuilder(ContextConfig(token_budget=50_000))
builder.index_files(source_map)   # {path: source_text}

result = builder.build(
    DiffSpec(changed_files=["src/auth.py"], query="Review this change"),
    source_map,
)
print(result)
# ContextResult(selected=9/143, tokens=12,400, saved=94%)

# Paste directly into your LLM call
print(result.rendered_context)

Why graphsift beats code-review-graph

Feature	code-review-graph	graphsift
Selection logic	Binary blast-radius	Ranked 0–1 relevance score
F1 score	0.54 (46% false positives)	~0.85 (ranked filtering)
Multi-file diff	Not supported	Union blast radius across all changed files
Decorator edges	Ignored	DECORATES edges tracked and traversed
Dynamic imports	Missed	Detected via regex + AST (`importlib.import_module`, `__import__`)
Token budget	None — sends raw source	Hard budget; fits selections to limit
Compression	None	tokenpruner on low-score files
Large repo hangs	Known issue (open bugs)	Depth cap + async; never hangs
Output modes	Full source only	FULL / SIGNATURES / COMPRESSED / SMART
Search ranking	MRR=0.35, acknowledged broken	BM25 + graph rank fusion
Token reduction	8–49x (single file)	80–150x (multi-file + compression)

Installation

pip install graphsift

# With tokenpruner compression (recommended, adds 3-5x more reduction):
pip install "graphsift[tokenpruner]"

Quick start

Index a repository

from graphsift import ContextBuilder, ContextConfig
from graphsift.adapters.filesystem import load_source_map

# Load all source files from disk (caller-supplied I/O)
source_map = load_source_map("./my_repo", extensions={".py", ".ts"})

builder = ContextBuilder(ContextConfig(
    token_budget=60_000,     # hard limit
    max_depth=4,             # graph traversal depth cap
    output_mode="smart",     # full for high-score, signatures for low-score
))
stats = builder.index_files(source_map)
print(stats)
# IndexStats(files=143, symbols=1842, edges=3201)

Build context for a diff

from graphsift import DiffSpec

result = builder.build(
    DiffSpec(
        changed_files=["src/auth.py", "src/middleware.py"],  # multi-file diff!
        query="Review authentication middleware changes",
        commit_message="feat: add JWT refresh token support",
        diff_text="...",   # optional raw unified diff
    ),
    source_map,
)

print(result)
# ContextResult(selected=11/143, tokens=18,200, saved=93%)

# Send to Claude / GPT-4:
llm_context = result.rendered_context

Drop-in Claude adapter

import anthropic
from graphsift.adapters.claude import ClaudeCodeReviewAdapter

client = anthropic.Anthropic()
adapter = ClaudeCodeReviewAdapter(client, builder)

response, meta = adapter.review(
    changed_files=["src/auth.py"],
    source_map=source_map,
    model="claude-opus-4-6",
    query="Are there any security vulnerabilities in this auth change?",
)

print(f"Tokens saved: {meta['reduction_ratio']:.0%}")
print(f"Files selected: {meta['files_selected']}/{meta['files_scanned']}")
# Tokens saved: 93%
# Files selected: 11/143

How it works

1. Multi-signal relevance ranking

Every file in the repo gets a 0–1 relevance score based on:

Graph distance (70% weight): BFS from changed files with score decay per hop (0.7× per level). Inheritance edges have higher weight (1.5×), dynamic imports lower (0.6×).
BM25 keyword overlap (30% weight): Symbol names matched against query + commit message.
Bonuses: Test files covering changed code, decorator proximity.
Penalties: Dynamic imports (uncertain deps), large files (>1000 lines).

2. Decorator + dynamic import edges

Changed: auth.py → AuthManager
  → DECORATES → @require_auth decorator
  → @require_auth used in: middleware.py, api/views.py
  → Both files selected (code-review-graph misses these entirely)

3. Token-budget-aware selection

Budget: 50,000 tokens
1. auth.py         score=1.000  → FULL      (2,100 tok)
2. middleware.py   score=0.841  → FULL      (3,400 tok)  
3. test_auth.py    score=0.714  → FULL      (1,200 tok)
4. user.py         score=0.490  → SIGNATURES (180 tok)   ← tokenpruner/signatures
5. base.py         score=0.312  → COMPRESSED (90 tok)    ← tokenpruner compressed
...
Total: 12,400 tokens vs 180,000 raw = 93% reduction

4. Multi-file diff (union blast radius)

# code-review-graph: only handles single file
DiffSpec(changed_files=["src/auth.py"])  # ✓

# graphsift: full union of all blast radii
DiffSpec(changed_files=["src/auth.py", "src/middleware.py", "src/models.py"])  # ✓

Advanced features

Smart Cache (LRU + TTL)

from graphsift import GraphCache

cache: GraphCache = GraphCache(maxsize=64, ttl=300)

@cache.memoize
def get_context(diff_key: str):
    return builder.build(diff, source_map)

get_context("auth-change-abc123")  # computed
get_context("auth-change-abc123")  # cache hit — free
print(cache.stats())

Analysis Pipeline with audit log

from graphsift import AnalysisPipeline

def filter_generated(result):
    """Remove auto-generated files from selection."""
    selected = [sf for sf in result.selected_files if "generated" not in sf.file_node.path]
    return result.model_copy(update={"selected_files": selected})

pipeline = (
    AnalysisPipeline(builder)
    .add_step("filter_generated", filter_generated)
    .with_retry(n=2, backoff=0.3)
)

result, audit = pipeline.run(diff_spec, source_map)
print(audit)  # per-step file counts, duration, errors

# Async
result, audit = await pipeline.arun(diff_spec, source_map)

Declarative validator

from graphsift import DiffValidator

validator = (
    DiffValidator()
    .require_changed_files()
    .require_max_files(50)
    .require_extensions({".py", ".ts", ".js"})
    .require_no_secrets_in_query()
    .add_rule("no_vendor", lambda d: not any("vendor" in f for f in d.changed_files), "Vendor files excluded")
)

errors = validator.validate(diff_spec)  # {} = valid
validator.validate_or_raise(diff_spec)  # raises ValidationError
await validator.avalidate(diff_spec)    # async

Async batch processing

from graphsift import async_batch_build, batch_index

# Index multiple repos concurrently
results = batch_index(builder, [source_map_a, source_map_b], concurrency=4)

# Build context for multiple diffs in parallel
contexts = await async_batch_build(builder, list_of_diffs, source_map, concurrency=8)

Rate limiter

from graphsift import RateLimiter, get_rate_limiter

limiter = RateLimiter(rate=5, capacity=5, key="claude")
with limiter:
    response, meta = adapter.review(...)

# Async
async with limiter:
    response, meta = await async_review(...)

# Per-key singleton
limiter = get_rate_limiter("user-abc", rate=3)

Streaming (highest-score files first)

from graphsift import stream_context, async_stream_context

# Start processing the most relevant files immediately
for batch in stream_context(builder, diff_spec, source_map, batch_size=3):
    for scored_file in batch:
        print(f"{scored_file.file_node.path}: {scored_file.score:.3f}")

# Async, cancellation-safe
async for batch in async_stream_context(builder, diff_spec, source_map):
    process(batch)

Diff engine — compare two context runs

from graphsift import ContextDiff

# Compare before/after a config change
r1 = builder.build(diff_spec, source_map)      # max_depth=2
r2 = builder2.build(diff_spec, source_map)     # max_depth=4

diff = ContextDiff(r1, r2)
print(diff.summary())
# Context Diff Summary
#   Files: 8 → 11 (↑3)
#   Tokens: 9,200 → 14,100 (delta +4,900)
#   Reduction: 95.1% → 92.2% (delta -2.9%)
#   Added: src/base_auth.py, src/session.py, ...

data = diff.to_json()  # machine-readable

Circuit breaker

from graphsift import CircuitBreaker

cb = CircuitBreaker(failure_threshold=3, reset_timeout=30)

@cb.protect
def call_llm_api(prompt: str) -> str:
    ...

print(cb.stats())
# {'state': 'closed', 'failures': 0, 'total_calls': 42, 'rejected_calls': 0}

Output modes

Mode	When	Token cost
`FULL`	High-score files (>0.5)	Full source
`SIGNATURES`	Low-score files	10–20% of full
`COMPRESSED`	Any file with tokenpruner installed	20–40% of full
`SMART`	Auto: FULL above threshold, SIGNATURES below	Best of both

Custom parser injection

from graphsift import register_parser, Language

# Inject a tree-sitter parser for exact results
class MyTreeSitterParser:
    def parse_file(self, path, source): ...
    def extract_signatures(self, source): ...

register_parser(Language.PYTHON, MyTreeSitterParser())

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.2

Apr 11, 2026

1.5.1

Apr 11, 2026

1.5.0

Apr 11, 2026

1.4.3

Apr 11, 2026

1.4.2

Apr 11, 2026

1.4.1

Apr 11, 2026

This version

1.4.0

Apr 11, 2026

1.3.1

Apr 11, 2026

1.3.0

Apr 11, 2026

1.2.0

Apr 11, 2026

1.1.1

Apr 10, 2026

1.1.0

Apr 10, 2026

1.0.1

Apr 10, 2026

1.0.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphsift-1.4.0.tar.gz (78.1 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graphsift-1.4.0-py3-none-any.whl (72.1 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file graphsift-1.4.0.tar.gz.

File metadata

Download URL: graphsift-1.4.0.tar.gz
Upload date: Apr 11, 2026
Size: 78.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for graphsift-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`68dfda15fee05bcae773d5bd2a2258858d2fdb544513832a80356ef0966ae7d2`
MD5	`193f10247fd49070cf1584d1cbee3a91`
BLAKE2b-256	`8ccc7053d9d0ea1d856ab328036f1209fe6ed645976d934910b11c36f222455a`

See more details on using hashes here.

File details

Details for the file graphsift-1.4.0-py3-none-any.whl.

File metadata

Download URL: graphsift-1.4.0-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 72.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for graphsift-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbfe8c0301fd528a5fbd3ff0861b50aaa1c3b4056ab8b39b970693e704d90aaa`
MD5	`661c4eeb497cd5bbf321f9f704c2e5ab`
BLAKE2b-256	`b979a57919554e7919df702634734f2c227a48ccefd72428bda86e4f078baadf`

See more details on using hashes here.

graphsift 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

graphsift

Why graphsift beats code-review-graph

Installation

Quick start

Index a repository

Build context for a diff

Drop-in Claude adapter

How it works

1. Multi-signal relevance ranking

2. Decorator + dynamic import edges

3. Token-budget-aware selection

4. Multi-file diff (union blast radius)

Advanced features

Smart Cache (LRU + TTL)

Analysis Pipeline with audit log

Declarative validator

Async batch processing

Rate limiter

Streaming (highest-score files first)

Diff engine — compare two context runs

Circuit breaker

Output modes

Custom parser injection

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes