Skip to main content

Context graph engine for AI coding assistants — build knowledge graphs, generate context capsules

Project description

ctxgraph

Context graph engine for AI coding assistants. Builds a multi-layer knowledge graph from your Python codebase and generates token-efficient context capsules for Claude, OpenAI, Ollama, and other AI tools.

pip install ctxgraph

# Build knowledge graph
ctx build

# Generate context for a task (92-99% fewer tokens than raw code)
ctx capsule "fix JWT expiry in auth module"

# Launch Claude with context pre-loaded
ccg "fix the login redirect bug"

# Visualize your codebase
ctx view

# Search the graph
ctx query "auth jwt validate"

How It Works

ctxgraph analyzes your Python codebase using static AST analysis to build a multi-layer knowledge graph in SQLite:

Repository (.py files)
    │
    ▼
┌──────────────────────────────────────────────┐
│               ctx build                        │
│                                               │
│  1. importer.py (AST)                         │
│     └── Extract imports → file-to-file edges  │
│                                               │
│  2. symbols.py (AST)                          │
│     └── Extract classes, functions, methods   │
│         calls, inheritance → symbol nodes     │
│                                               │
│  3. semantic.py (docstrings)                  │
│     └── Extract summaries → node enrichment   │
│                                               │
│  Store: SQLite (nodes + edges tables)         │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│          Context Capsule Generation           │
│                                               │
│  1. Tokenize query → keyword search           │
│  2. Score: name matches (2x), text (0.5x)    │
│  3. BFS neighborhood expansion (depth=2)      │
│  4. Render token-efficient DSL format         │
└──────────────────────────────────────────────┘

Architecture

┌─────────┐    ┌──────────────┐    ┌──────────────┐
│   CLI   │───▶│  Analyzers   │───▶│   SQLite DB  │
│  typer  │    │  AST-based   │    │  .ctxgraph/  │
└────┬────┘    └──────────────┘    └──────┬───────┘
     │                                    │
     ├── ctx build ───────────────────────▶│  Graph build
     │                                     │
     ├── ctx capsule ◀─────────────────────│  Query + BFS
     │                                     │
     ├── ctx query ◀──────────────────────│  Search
     │                                     │
     ├── ctx view ◀───────────────────────│  D3.js viz
     │                                     │
     ├── ctx serve ◀──────────────────────│  MCP server
     │                                     │
     └── ccg wrapper ───▶ Claude Code ────┘  AI tool

Token-Efficient DSL Format

ctxgraph uses a custom DSL format instead of JSON, saving ~4.7× tokens on average:

JSON: 426 tokens                    DSL: 143 tokens
─────                               ────
{                                   [CTX]calculator expression parsing
  "nodes": [
    {                               [F]calc/parser.py
      "id": "file:calc/parser.py",    D:Tokenize and parse math expressions
      "type": "file",                S:tokenize, parse, Expression
      "name": "parser.py",          [F]calc/core.py
      "path": "calc/parser.py",      D:Core math operations
      "summary": "Tokenize..."      [C]Calculator
    },                                D:Main calculator class
    ...
  ],                                 [DEP]
  "edges": [...]                      parser.py → core.py
}                                     parser.py → plugins.py

Commands

ctx build — Build knowledge graph

# Current directory
ctx build

# Specific repo
ctx build /path/to/project

# Custom exclude patterns
ctx build --exclude "vendor/*" --exclude "legacy/*"

ctx capsule <query> — Generate context

# Balanced (default: 20 nodes, depth 2)
ctx capsule "fix JWT token validation"

# Fast (10 nodes, depth 1)
ctx capsule "fix JWT token validation" --mode fast

# Deep (40 nodes, depth 3)
ctx capsule "fix JWT token validation" --mode deep

# Project architecture overview
ctx capsule --overview

ctx query <search> — Search graph

ctx query "user auth"
ctx query "payment gateway" --mode deep

Returns ranked nodes with relevance scores.

ctx view — Visualize graph

ctx view
ctx view --output graph.html
ctx view --port 8080 --no-open

Generates interactive D3.js force-directed HTML (zero JS toolchain).

ctx serve — MCP server

pip install ctxgraph[mcp]
ctx serve

Starts an MCP protocol server. Claude Desktop config:

{
  "mcpServers": {
    "ctxgraph": {
      "command": "ctx",
      "args": ["serve"]
    }
  }
}

Tools: search_graph, get_context_capsule, get_file_dependencies, get_project_overview.

ctx info — Graph statistics

ctx info
# ┌────────────────────┬───────┐
# │ Total Nodes        │ 1090  │
# │ Total Edges        │ 1565  │
# │   files            │ 147   │
# │   classes          │ 45    │
# │   functions        │ 312   │
# └────────────────────┴───────┘

Claude Wrapper (ccg)

# Single-shot
ccg "fix the JWT expiry bug in auth module"

# Interactive session with context pre-loaded
ccg --chat "refactor the payment flow"

# Project overview
ccg --overview

# With specific mode
ccg --mode deep "redesign the database schema"

Modes

Mode Max Nodes BFS Depth Use Case
fast 10 1 Quick questions, small fixes
balanced (default) 20 2 General development
deep 40 3 Complex refactoring, architecture

Configuration

.ctxgraph/config.toml (or .ctxgraph/config.json):

[graph]
exclude = ["legacy/*", "vendor/*"]

[ai]
provider = "ollama"           # ollama, claude, openai, custom
model = "qwen2.5-coder:7b"
endpoint = "http://localhost:11434"

[context]
mode = "balanced"
max_nodes = 20
max_depth = 2

Environment overrides:

Variable Overrides
CTXGRAPH_PROVIDER ai.provider
CTXGRAPH_MODEL ai.model
CTXGRAPH_ENDPOINT ai.endpoint
ANTHROPIC_API_KEY Claude API key
OPENAI_API_KEY OpenAI API key

Provider Switching

# Ollama (default, no API key)
ctx capsule "query"

# Claude
CTXGRAPH_PROVIDER=claude CTXGRAPH_MODEL=claude-sonnet-4-20250514 ctx capsule "query"

# OpenAI
CTXGRAPH_PROVIDER=openai CTXGRAPH_MODEL=gpt-4o ctx capsule "query"

# Custom (OpenAI-compatible API)
CTXGRAPH_PROVIDER=custom CTXGRAPH_ENDPOINT=http://my-api/v1 ctx capsule "query"

Benchmark Results

Methodology

All benchmarks measure token count (whitespace-split word count) as a reproducible proxy for LLM token usage.

Token Efficiency (Capsule vs Raw Files)

Baseline ("without graph"): All .py files in the project directory (excluding build artifacts like __pycache__, .git, venv). This represents what an AI assistant would need to read to understand the codebase without ctxgraph.

Measurement: For each project, we build the graph once, then run multiple queries across all three modes (fast/balanced/deep). The capsule token count is averaged across queries and compared against the raw file token count.

Savings formula: (1 - capsule_tokens / raw_tokens) × 100

Project Files Raw Tokens Avg Capsule Tokens Avg Saved Build Time
tiny_app 7 1,558 ~112 92.8% ~82ms
web_api 23 6,567 ~136 97.9% ~474ms
microsvc 22 10,587 ~63 99.4% ~916ms
dataflow 35 ~12,500 ~78 ~99.4% ~560ms

Overall: 97.0% average token savings across all 4 projects and 42 benchmark runs.

DSL vs JSON Format Efficiency

Methodology: For the same set of nodes and edges, we render both a DSL capsule and an equivalent JSON structure. We compare token counts across both representations.

Project Query DSL Tokens JSON Tokens Ratio
tiny_app calculator 147 434 3.0×
tiny_app parse expression 137 451 3.3×
web_api user management 126 403 3.2×
web_api JWT auth login 136 308 2.3×
microsvc auth service 32 219 6.8×
microsvc payment billing 42 395 9.4×

Overall: 4.7× fewer tokens than equivalent JSON representation.

Ollama Comparison (With vs Without Graph)

Methodology: We compare LLM answer quality (keyword recall coverage) with and without ctxgraph context. For each query:

  1. Without graph: Ask Ollama the question directly (no code context)
  2. With graph: Build a context capsule from the graph, prepend it to the same question
  3. Coverage score: % of predefined keywords (file names, concepts) that appear in the answer
Query Coverage (no ctx) Coverage (with ctx) Δ
Calculator expression parsing (tiny_app) 100% 100%
Plugin registration system (tiny_app) 33% 100% +67pp
JWT authentication (web_api) 75% 100% +25pp
Middleware pipeline (web_api) 100% 100%
Circuit breaker (microsvc) 75% 75%
Services & communication (microsvc) 50% 100% +50pp
PipelineBuilder pattern (dataflow) 100% 75% -25pp
Processor registration (dataflow) 33% 67% +34pp
Event bus & error handling (dataflow) 100% 100%

Results: Average coverage improvement of +16.7pp. Coverage improved on 4/9 queries (44%). For project-specific questions (plugin system, services, processors), the graph provides concrete file and class names the model cannot guess from training data alone.

Note: The one regression (PipelineBuilder) occurred because without context the model gave a generic answer matching all keywords, while with context it focused on the actual codebase implementation and missed the "scheduler" keyword — a more honest and useful answer for the developer.


Examples

Debug a failing test

ctx build
ctx capsule "test_user_login is failing with auth error" --mode deep
# Output →
# [F]tests/test_auth.py
# [F]src/auth/login.py
# [C]AuthService
# [DEP] auth/login.py → core/database.py, auth/session.py

Understand a new codebase

ctx capsule "project architecture" --overview
ccg --chat "explain the overall architecture and data flow"

Refactor across modules

ctx capsule "extract payment processing into separate module" --mode deep

Development

git clone https://github.com/shashi3070/ctxgraph.git
cd ctxgraph

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run benchmarks
python benchmarks/run_benchmarks.py

# Ollama comparison (requires local Ollama)
python benchmarks/run_ollama_comparison.py

Project Structure

src/ctxgraph/
├── cli/main.py              — Typer CLI (6 commands)
├── graph/
│   ├── models.py            — Node, Edge, Graph dataclasses
│   ├── storage.py           — SQLite persistence
│   ├── builder.py           — Graph build orchestrator
│   └── query.py             — Tokenizer + BFS + relevance scoring
├── capsule/renderer.py      — DSL context generation
├── analyzers/python/
│   ├── importer.py          — AST import extraction
│   ├── symbols.py           — AST class/function/method analysis
│   └── semantic.py          — Docstring summarization
├── config/
│   ├── settings.py          — TOML/JSON/env config loading
│   └── providers.py         — Ollama, Claude, OpenAI API clients
├── clients/models.py        — Model mode enum (fast/balanced/deep)
├── exclude/patterns.py      — Exclusion pattern matching
├── view/visualizer.py       — D3.js HTML graph generator
├── wrapper/claude.py        — ccg Claude wrapper
└── mcp/server.py            — MCP protocol server

Known Limitations

  • Python-only analysis — other languages get file-level nodes only
  • Keyword-based search — no semantic/embedding matching (planned)
  • No incremental rebuild — full rebuild on every ctx build (planned)
  • MCP server — stdio mode only, SSE not yet supported

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxgraph-0.1.0.tar.gz (39.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxgraph-0.1.0-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file ctxgraph-0.1.0.tar.gz.

File metadata

  • Download URL: ctxgraph-0.1.0.tar.gz
  • Upload date:
  • Size: 39.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for ctxgraph-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea1e80a8ff43c708b22194ba7743a88fd1fd06a30a3f22738e723b6ad75dea7d
MD5 62629fd6f8f39b8612bd022e6fbceb87
BLAKE2b-256 1676671bf2c82b2b82dbcf152aa47209c0e96f35e98da5e051e3b882207a5b98

See more details on using hashes here.

File details

Details for the file ctxgraph-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ctxgraph-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for ctxgraph-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c467fde36f6c2790ac464656fc450e039ccfdcd0481ed1f8d9adb05ca5b799f2
MD5 24ed47a5a04b312d42c914b4517d6a5e
BLAKE2b-256 6ba85f8047e6530cc3501fd9f0a9ca359859ecad39e6a049ec458c398db3006c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page