Skip to main content

Token-budget-aware graph navigation for AI coding agents. Serve exactly the noodles your LLM needs.

Project description

slurp

tests python license pypi

graphify builds the bowl. slurp serves exactly the noodles your LLM needs.

A knowledge graph is a bowl of ramen — thousands of nodes tangled together. Your LLM doesn't need the whole bowl. Slurp scores every node against your query, then greedily selects the highest-relevance subgraph that fits within your token budget — and tells you exactly what it picked and why.


Benchmark

Tested on a real PrismaStats codebase: 2,111 nodes, 28,412 tokens total.

Query Budget 2k Budget 4k Budget 8k
"auth flow" 97.1% saved 96.3% saved 95.2% saved
"prisma schema" 95.8% saved 94.2% saved 93.8% saved
"database pool" 93.1% saved 89.1% saved 85.1% saved

Mean savings: 93.3% · p50: 94.2% · Best case: 97.1%

Even the worst case — "database pool" at budget 8k — injects 85% fewer tokens than the full graph.


Install

pip install slurp-graph

# or with uv
uv add slurp-graph

PyPI package: slurp-graph — CLI command: slurp


Quickstart

slurp "auth flow" --graph graph.json --budget 4000
╭─ Slurp — Subgraph for: "auth flow" (budget: 4,000 tokens) ──────────────╮
│ Selected 5/2111 nodes · 847/4,000 tokens used (21.2%)                    │
╰───────────────────────────────────────────────────────────────────────────╯

## Relevant Nodes

### authenticate_user (function) · score: 0.94
Validates user credentials and returns JWT token.
→ File: src/auth/service.py

### JWTMiddleware (class) · score: 0.87
Intercepts HTTP requests and validates Authorization header.
→ File: src/middleware/jwt.py

### hash_password (function) · score: 0.71
Hashes password using bcrypt with a cost factor of 12.
→ File: src/auth/utils.py

## Key Relationships
- JWTMiddleware → calls → authenticate_user
- authenticate_user → calls → hash_password

---
💡 2106 additional connected nodes available — increase --budget to include them

Add --inject-code to embed the actual function body next to each node:

slurp "auth flow" --graph graph.json --budget 4000 --inject-code
### authenticate_user (function) · score: 0.94
Validates user credentials and returns JWT token.
→ File: src/auth/service.py

```python
def authenticate_user(username: str, password: str) -> dict | None:
    user = db.query(User).filter_by(username=username).first()
    if not user or not bcrypt.checkpw(password.encode(), user.password_hash):
        return None
    return {"token": jwt.encode({"sub": user.id}, SECRET_KEY)}
```

Pipe the output directly into your LLM prompt, save it to a file, or use slurp export to format it as a ready-to-paste system prompt block.


Commands

slurp QUERY

The main command. Scores all graph nodes against your query and selects the optimal subgraph within the token budget.

slurp "auth flow" --graph graph.json --budget 4000
slurp "payment processing" --format json
slurp "JWT validation" --explain
slurp "database schema" --inject-code --min-score 0.3
slurp "prisma models" --backend openai
Flag Default Description
--graph, -g auto-discover Path to graph.json.
--budget, -b 4000 Token budget for subgraph selection.
--format, -f markdown Output format: markdown, json, or yaml.
--model, -m cl100k_base Tiktoken encoding for token counting.
--explain off Print per-node score breakdown: final / structural / semantic.
--no-audit off Skip writing to .slurp/audit.jsonl.
--neighbor-decay 0.7 Score multiplier applied to neighbors of each selected node.
--min-score 0.15 Minimum relevance score; nodes below this are excluded before selection.
--viz off Open an interactive graph visualization in the browser.
--ignore-file .slurpignore Path to node exclusion rules.
--backend tfidf Scoring backend: tfidf (default), openai, or anthropic.
--inject-code off Embed source code blocks for each selected node (requires ≤30 nodes).
--project-root graph dir Root directory for resolving source_file paths.

Auto-discovery (when --graph is omitted):

  1. ./graph.json
  2. ./graphify-out/graph.json
  3. ./.graphify/graph.json

slurp stats

Print node and edge counts for a graph file.

slurp stats --graph graph.json
Graph: graph.json
Nodes: 2111
Edges: 4823

slurp audit

Show the history of queries logged to .slurp/audit.jsonl, plus the most frequently selected nodes.

slurp audit
slurp audit --top-nodes 20
slurp audit --audit-dir /custom/.slurp

Every query is appended as a JSON line (unless --no-audit is passed). Useful for tracking which parts of your codebase an AI agent visits most.


slurp diff

Compare two graph versions and report the impact of changes.

slurp diff old.json new.json
slurp diff old.json new.json --hops 2 --viz
slurp diff old.json new.json --budget 4000

Reports added/removed/modified nodes and edges, computes an impact score based on centrality, and optionally opens a diff-colored visualization (green=added, red=removed, yellow=modified, grey=unchanged). Pass --budget to further select the most relevant affected nodes.


slurp export

Export a context block ready to paste into an AI system prompt.

slurp export "auth flow" --format claude     # <context> XML tags
slurp export "auth flow" --format chatgpt    # [CODEBASE CONTEXT] block
slurp export "auth flow" --format claudemd   # ## Codebase Context for CLAUDE.md
slurp export "auth flow" --output context.md

All three formats include query, nodes selected/total, tokens used/budget, and coverage %.


slurp serve

Start an MCP stdio server (JSON-RPC 2.0) that exposes the slurp_query tool.

slurp serve --graph graph.json

See MCP Integration for configuration.


slurp benchmark

Measure real token savings across queries and budgets.

slurp benchmark \
  --graph graph.json \
  --queries "auth flow" --queries "schema validation" \
  --budget 2000 --budget 4000 --budget 8000

Outputs a per-run table and aggregate stats: mean savings, p50/p90/p95, best/worst case, and precision (fraction of relevant nodes captured).


Works with graphify

Slurp is the query layer for graphify. Run graphify on your codebase, point slurp at the output.

graphify .                                        # generates graphify-out/graph.json
slurp "auth flow" --budget 4000                   # auto-discovers graphify-out/graph.json

Supported node fields:

{
  "id": "authenticate_user",
  "label": "authenticate_user",
  "type": "function",
  "description": "Validates credentials and returns JWT.",
  "importance": 9,
  "source_file": "src/auth/service.py",
  "source_location": "L42"
}

The type, description, importance, source_file, and source_location fields are optional but improve scoring and enable --inject-code. Any graph with id + label on nodes and source/target on edges will work.

Both links (graphify/NetworkX serialization) and edges are supported. Additional formats are auto-detected by extension:

Extension Format
.json graphify or generic JSON
.graphml GraphML (NetworkX / yEd / Gephi)
.csv Neo4j export (nodes CSV + sibling relationships CSV)

Use slurp convert or the convert_graph() API to export between formats.


MCP Integration

Run slurp as an MCP server so Claude Code (or any MCP-compatible agent) can query the graph directly.

.mcp.json:

{
  "mcpServers": {
    "slurp": {
      "command": "/path/to/.venv/bin/slurp",
      "args": ["serve", "--graph", "/path/to/graphify-out/graph.json"]
    }
  }
}

Tool exposed: slurp_query(query: str, budget: int = 4000) → str

Claude Code calls this automatically when it needs codebase context. The server runs over stdio and returns the formatted markdown subgraph — no HTTP, no ports.


.slurpignore

Exclude nodes by type, file path, or ID pattern. Create .slurpignore in your project root:

# Exclude documentation nodes
type:document
type:markdown

# Exclude test files
file:tests/**
file:**/*.test.ts

# Exclude generated code
id:generated_*

Pass a custom path with --ignore-file path/to/.slurpignore.


Design decisions

Power-iteration PageRank without numpy. nx.pagerank() requires numpy. Slurp implements a 20-line pure-Python power-iteration algorithm (convergence: Σ|rank_new − rank_old| < N × tol). Same result, no heavy dependency.

TF-IDF without scikit-learn. Hand-rolled TF-IDF with smoothed IDF (log((N+1)/(df+1)) + 1) and cosine similarity. The tokenizer splits camelCase and snake_case, so authenticate_user scores on both authenticate and user. The score_nodes() interface is backend-agnostic — swap to real embeddings with --backend openai or --backend anthropic without touching any caller.

YAML serializer without PyYAML. _yaml_scalar() renders Python primitives as valid YAML scalars using json.dumps() for strings that need quoting (JSON string literals are valid YAML 1.1). No PyYAML dependency.

lru_cache on the tiktoken encoder. tiktoken.get_encoding() reads tokenizer data from disk on first call. Caching with lru_cache(maxsize=8) means repeated token-counting calls within a single run hit memory, not disk.

+0.3 score boost for file_type == "code" nodes (clamped to 1.0). Documentation nodes compete unfairly with code in technical queries. The boost is bounded so it cannot override a genuinely high structural+semantic score.

--inject-code capped at 30 nodes. Code blocks are 50–200 tokens each. At 30 nodes, that's up to 6,000 extra tokens — manageable. At 200 nodes it would explode the context budget. The cap is enforced in both the CLI (warning message) and inject_code() (hard guard), so the formatter never receives oversized input.


Roadmap

  • v0.1.0loader, scorer, budget, formatter, audit — core pipeline, full tests, slurp QUERY + slurp stats
  • v0.2.0--explain, .slurpignore, --viz interactive HTML, --min-score, camelCase/snake_case tokenizer, --neighbor-decay
  • v0.3.0slurp serve (MCP stdio), slurp diff, slurp export (claude/chatgpt/claudemd), PyPI publish as slurp-graph
  • v0.4.0--backend openai|anthropic (optional embeddings), slurp benchmark, GraphML + Neo4j CSV loader, convert_graph()
  • v0.5.0--inject-code: extract real function bodies from source files and embed them in the context output

License

MIT © Juan Carlos Vallejo Ruiz

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurp_graph-0.1.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurp_graph-0.1.0-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file slurp_graph-0.1.0.tar.gz.

File metadata

  • Download URL: slurp_graph-0.1.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for slurp_graph-0.1.0.tar.gz
Algorithm Hash digest
SHA256 30b1a3b7971ea733231f563e68432a23cc647c94592d1f84a78361e3ebffff0a
MD5 1e793d99f8e9db94659bc5e9c6ce0fdb
BLAKE2b-256 112746c54459885ecc2a09efdd724f67210906dc27cb80e0764cda19d732cbcf

See more details on using hashes here.

File details

Details for the file slurp_graph-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: slurp_graph-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for slurp_graph-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b52a2b92bc706997bcf5a38b180c7ac26d3d75b09f0d73fdc2a4765615cdf374
MD5 53ca6397db90dec3258a3bebf0402ce3
BLAKE2b-256 1a2c51eb994b213ce4bb037c68f0bcba533a3879e3b9febb7be556a365b352e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page