Context graph engine for AI coding assistants — build knowledge graphs, generate context capsules
Project description
ctxgraph
Context graph engine for AI coding assistants. Builds a multi-layer knowledge graph from your Python codebase and generates token-efficient context capsules for Claude, OpenAI, Ollama, and other AI tools.
pip install ctxgraph
# Build knowledge graph
ctx build
# Generate context for a task (92-99% fewer tokens than raw code)
ctx capsule "fix JWT expiry in auth module"
# Launch Claude with context pre-loaded
ccg "fix the login redirect bug"
# Visualize your codebase
ctx view
# Search the graph
ctx query "auth jwt validate"
How It Works
ctxgraph analyzes your Python codebase using static AST analysis to build a multi-layer knowledge graph in SQLite:
Repository (.py files)
│
▼
┌──────────────────────────────────────────────┐
│ ctx build │
│ │
│ 1. importer.py (AST) │
│ └── Extract imports → file-to-file edges │
│ │
│ 2. symbols.py (AST) │
│ └── Extract classes, functions, methods │
│ calls, inheritance → symbol nodes │
│ │
│ 3. semantic.py (docstrings) │
│ └── Extract summaries → node enrichment │
│ │
│ Store: SQLite (nodes + edges tables) │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Context Capsule Generation │
│ │
│ 1. Tokenize query → keyword search │
│ 2. Score: name matches (2x), text (0.5x) │
│ 3. BFS neighborhood expansion (depth=2) │
│ 4. Render token-efficient DSL format │
└──────────────────────────────────────────────┘
Architecture
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ CLI │───▶│ Analyzers │───▶│ SQLite DB │
│ typer │ │ AST-based │ │ .ctxgraph/ │
└────┬────┘ └──────────────┘ └──────┬───────┘
│ │
├── ctx build ───────────────────────▶│ Graph build
│ │
├── ctx capsule ◀─────────────────────│ Query + BFS
│ │
├── ctx query ◀──────────────────────│ Search
│ │
├── ctx view ◀───────────────────────│ D3.js viz
│ │
├── ctx serve ◀──────────────────────│ MCP server
│ │
└── ccg wrapper ───▶ Claude Code ────┘ AI tool
Token-Efficient DSL Format
ctxgraph uses a custom DSL format instead of JSON, saving ~4.7× tokens on average:
JSON: 426 tokens DSL: 143 tokens
───── ────
{ [CTX]calculator expression parsing
"nodes": [
{ [F]calc/parser.py
"id": "file:calc/parser.py", D:Tokenize and parse math expressions
"type": "file", S:tokenize, parse, Expression
"name": "parser.py", [F]calc/core.py
"path": "calc/parser.py", D:Core math operations
"summary": "Tokenize..." [C]Calculator
}, D:Main calculator class
...
], [DEP]
"edges": [...] parser.py → core.py
} parser.py → plugins.py
Commands
ctx build — Build knowledge graph
# Current directory
ctx build
# Specific repo
ctx build /path/to/project
# Custom exclude patterns
ctx build --exclude "vendor/*" --exclude "legacy/*"
ctx capsule <query> — Generate context
# Balanced (default: 20 nodes, depth 2)
ctx capsule "fix JWT token validation"
# Fast (10 nodes, depth 1)
ctx capsule "fix JWT token validation" --mode fast
# Deep (40 nodes, depth 3)
ctx capsule "fix JWT token validation" --mode deep
# Project architecture overview
ctx capsule --overview
ctx query <search> — Search graph
ctx query "user auth"
ctx query "payment gateway" --mode deep
Returns ranked nodes with relevance scores.
ctx view — Visualize graph
ctx view
ctx view --output graph.html
ctx view --port 8080 --no-open
Generates interactive D3.js force-directed HTML (zero JS toolchain).
ctx serve — MCP server
pip install ctxgraph[mcp]
ctx serve
Starts an MCP protocol server. Claude Desktop config:
{
"mcpServers": {
"ctxgraph": {
"command": "ctx",
"args": ["serve"]
}
}
}
Tools: search_graph, get_context_capsule, get_file_dependencies, get_project_overview.
ctx info — Graph statistics
ctx info
# ┌────────────────────┬───────┐
# │ Total Nodes │ 1090 │
# │ Total Edges │ 1565 │
# │ files │ 147 │
# │ classes │ 45 │
# │ functions │ 312 │
# └────────────────────┴───────┘
Claude Wrapper (ccg)
# Single-shot
ccg "fix the JWT expiry bug in auth module"
# Interactive session with context pre-loaded
ccg --chat "refactor the payment flow"
# Project overview
ccg --overview
# With specific mode
ccg --mode deep "redesign the database schema"
Modes
| Mode | Max Nodes | BFS Depth | Use Case |
|---|---|---|---|
fast |
10 | 1 | Quick questions, small fixes |
balanced (default) |
20 | 2 | General development |
deep |
40 | 3 | Complex refactoring, architecture |
Configuration
.ctxgraph/config.toml (or .ctxgraph/config.json):
[graph]
exclude = ["legacy/*", "vendor/*"]
[ai]
provider = "ollama" # ollama, claude, openai, custom
model = "qwen2.5-coder:7b"
endpoint = "http://localhost:11434"
[context]
mode = "balanced"
max_nodes = 20
max_depth = 2
Environment overrides:
| Variable | Overrides |
|---|---|
CTXGRAPH_PROVIDER |
ai.provider |
CTXGRAPH_MODEL |
ai.model |
CTXGRAPH_ENDPOINT |
ai.endpoint |
ANTHROPIC_API_KEY |
Claude API key |
OPENAI_API_KEY |
OpenAI API key |
Provider Switching
# Ollama (default, no API key)
ctx capsule "query"
# Claude
CTXGRAPH_PROVIDER=claude CTXGRAPH_MODEL=claude-sonnet-4-20250514 ctx capsule "query"
# OpenAI
CTXGRAPH_PROVIDER=openai CTXGRAPH_MODEL=gpt-4o ctx capsule "query"
# Custom (OpenAI-compatible API)
CTXGRAPH_PROVIDER=custom CTXGRAPH_ENDPOINT=http://my-api/v1 ctx capsule "query"
Benchmark Results
Methodology
All benchmarks measure token count (whitespace-split word count) as a reproducible proxy for LLM token usage.
Token Efficiency (Capsule vs Raw Files)
Baseline ("without graph"): All .py files in the project directory (excluding build artifacts like __pycache__, .git, venv). This represents what an AI assistant would need to read to understand the codebase without ctxgraph.
Measurement: For each project, we build the graph once, then run multiple queries across all three modes (fast/balanced/deep). The capsule token count is averaged across queries and compared against the raw file token count.
Savings formula: (1 - capsule_tokens / raw_tokens) × 100
| Project | Files | Raw Tokens | Avg Capsule Tokens | Avg Saved | Build Time |
|---|---|---|---|---|---|
| tiny_app | 7 | 1,558 | ~112 | 92.8% | ~82ms |
| web_api | 23 | 6,567 | ~136 | 97.9% | ~474ms |
| microsvc | 22 | 10,587 | ~63 | 99.4% | ~916ms |
| dataflow | 35 | ~12,500 | ~78 | ~99.4% | ~560ms |
Overall: 97.0% average token savings across all 4 projects and 42 benchmark runs.
DSL vs JSON Format Efficiency
Methodology: For the same set of nodes and edges, we render both a DSL capsule and an equivalent JSON structure. We compare token counts across both representations.
| Project | Query | DSL Tokens | JSON Tokens | Ratio |
|---|---|---|---|---|
| tiny_app | calculator | 147 | 434 | 3.0× |
| tiny_app | parse expression | 137 | 451 | 3.3× |
| web_api | user management | 126 | 403 | 3.2× |
| web_api | JWT auth login | 136 | 308 | 2.3× |
| microsvc | auth service | 32 | 219 | 6.8× |
| microsvc | payment billing | 42 | 395 | 9.4× |
Overall: 4.7× fewer tokens than equivalent JSON representation.
Ollama Comparison (With vs Without Graph)
Methodology: We compare LLM answer quality (keyword recall coverage) with and without ctxgraph context. For each query:
- Without graph: Ask Ollama the question directly (no code context)
- With graph: Build a context capsule from the graph, prepend it to the same question
- Coverage score: % of predefined keywords (file names, concepts) that appear in the answer
| Query | Coverage (no ctx) | Coverage (with ctx) | Δ |
|---|---|---|---|
| Calculator expression parsing (tiny_app) | 100% | 100% | — |
| Plugin registration system (tiny_app) | 33% | 100% | +67pp |
| JWT authentication (web_api) | 75% | 100% | +25pp |
| Middleware pipeline (web_api) | 100% | 100% | — |
| Circuit breaker (microsvc) | 75% | 75% | — |
| Services & communication (microsvc) | 50% | 100% | +50pp |
| PipelineBuilder pattern (dataflow) | 100% | 75% | -25pp |
| Processor registration (dataflow) | 33% | 67% | +34pp |
| Event bus & error handling (dataflow) | 100% | 100% | — |
Results: Average coverage improvement of +16.7pp. Coverage improved on 4/9 queries (44%). For project-specific questions (plugin system, services, processors), the graph provides concrete file and class names the model cannot guess from training data alone.
Note: The one regression (PipelineBuilder) occurred because without context the model gave a generic answer matching all keywords, while with context it focused on the actual codebase implementation and missed the "scheduler" keyword — a more honest and useful answer for the developer.
Examples
Debug a failing test
ctx build
ctx capsule "test_user_login is failing with auth error" --mode deep
# Output →
# [F]tests/test_auth.py
# [F]src/auth/login.py
# [C]AuthService
# [DEP] auth/login.py → core/database.py, auth/session.py
Understand a new codebase
ctx capsule "project architecture" --overview
ccg --chat "explain the overall architecture and data flow"
Refactor across modules
ctx capsule "extract payment processing into separate module" --mode deep
Development
git clone https://github.com/shashi3070/ctxgraph.git
cd ctxgraph
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run benchmarks
python benchmarks/run_benchmarks.py
# Ollama comparison (requires local Ollama)
python benchmarks/run_ollama_comparison.py
Project Structure
src/ctxgraph/
├── cli/main.py — Typer CLI (6 commands)
├── graph/
│ ├── models.py — Node, Edge, Graph dataclasses
│ ├── storage.py — SQLite persistence
│ ├── builder.py — Graph build orchestrator
│ └── query.py — Tokenizer + BFS + relevance scoring
├── capsule/renderer.py — DSL context generation
├── analyzers/python/
│ ├── importer.py — AST import extraction
│ ├── symbols.py — AST class/function/method analysis
│ └── semantic.py — Docstring summarization
├── config/
│ ├── settings.py — TOML/JSON/env config loading
│ └── providers.py — Ollama, Claude, OpenAI API clients
├── clients/models.py — Model mode enum (fast/balanced/deep)
├── exclude/patterns.py — Exclusion pattern matching
├── view/visualizer.py — D3.js HTML graph generator
├── wrapper/claude.py — ccg Claude wrapper
└── mcp/server.py — MCP protocol server
Known Limitations
- Python-only analysis — other languages get file-level nodes only
- Keyword-based search — no semantic/embedding matching (planned)
- No incremental rebuild — full rebuild on every
ctx build(planned) - MCP server — stdio mode only, SSE not yet supported
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ctxgraph-0.1.0.tar.gz.
File metadata
- Download URL: ctxgraph-0.1.0.tar.gz
- Upload date:
- Size: 39.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea1e80a8ff43c708b22194ba7743a88fd1fd06a30a3f22738e723b6ad75dea7d
|
|
| MD5 |
62629fd6f8f39b8612bd022e6fbceb87
|
|
| BLAKE2b-256 |
1676671bf2c82b2b82dbcf152aa47209c0e96f35e98da5e051e3b882207a5b98
|
File details
Details for the file ctxgraph-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ctxgraph-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c467fde36f6c2790ac464656fc450e039ccfdcd0481ed1f8d9adb05ca5b799f2
|
|
| MD5 |
24ed47a5a04b312d42c914b4517d6a5e
|
|
| BLAKE2b-256 |
6ba85f8047e6530cc3501fd9f0a9ca359859ecad39e6a049ec458c398db3006c
|