Self-hosted codebase intelligence platform — graph + vector indexing with MCP tools for IDE-embedded LLMs
Project description
Ripple
Code intelligence from git history — not static analysis.
Works inside Claude Code · Cursor · GitHub Copilot Agent · OpenHands · Windsurf
Static analysis tells you what could break. Git history tells you what actually breaks together. The gap between those two is where production incidents live — and where Ripple operates.
Try it in 60 seconds — no GPU, no config
Analyze any public repo's blast radius from its commit history:
git clone https://github.com/Amitshukla2308/Index-the-code
cd Index-the-code
pip install -e .
python3 apps/cli/demo.py https://github.com/your-org/your-repo
open hr-demo-report.html
You get an HTML report of the highest-risk files ranked by co-change history. On Flask: src/flask/app.py scores 1120 — the single most-coupled file across 2000 commits. No surprise. Immediately useful.
What it does
Blast Radius — +322% recall over static import graph
Every other tool counts import edges. Ripple counts how often files actually changed together across your git history. The difference: only 14.9% of import neighbors ever co-change — the static graph predicts risk for files that don't need review. Temporal signals catch the 85% the graph misses.
| Metric | Static (v1) | Temporal (v2) | Delta |
|---|---|---|---|
| recall@10 | 0.11 | 0.47 | +322% |
| MRR | 0.08 | 0.36 | +359% |
Guard — static semantic checks at 2.4ms/file
AI-generated code passes every review gate because it looks correct. Guard verifies what it claims: checks that comments match the code that follows, that locks aren't released before promised mutations complete, that auth happens before action. Catches the class of bugs where the AI wrote a plausible lie.
# Run on any Python/Haskell/Rust/Go/JS codebase
python3 -m ripple.guard path/to/changed_file.py
Patterns: lock scope, premature release, transaction boundaries, auth-before-action, error swallowing.
15 MCP Tools — plug into any AI coding assistant
One config block gives your entire team's AI assistants access to your codebase's history:
{
"mcpServers": {
"ripple": {
"type": "sse",
"url": "http://127.0.0.1:8002/sse"
}
}
}
Setup guides: Cursor · GitHub Copilot · OpenHands
| Tool | What it answers |
|---|---|
check_my_changes |
Full PR verdict: blast radius + Guard + risk score + reviewers |
get_blast_radius |
Which files co-change with these? Tiered by confidence. |
get_why_context |
WHY is this code the way it is? Ownership, activity trend, Granger causal direction, anti-patterns. |
predict_missing_changes |
What files are likely missing from this PR? |
score_change_risk |
Composite 0-100 risk score for a changeset |
suggest_reviewers |
Who owns these modules from git history? |
check_criticality |
How critical is this module? (blast + coupling + recency) |
get_guardrails |
What must stay true when touching this module? |
list_critical_modules |
Top-N highest-risk modules in the codebase |
fast_search |
Zero-GPU BM25 keyword search, ~40ms p50. No embed server needed. |
search_symbols |
Semantic + keyword + co-change fusion search |
search_modules |
Find which namespace contains relevant code |
get_module |
All symbols in a module |
get_function_body |
Source code of a function by ID |
trace_callers |
Who calls this? (upstream impact) |
trace_callees |
What does this call? (downstream deps) |
get_context |
Large context block — last resort |
Full setup
Prerequisites
python3 --version # 3.11+
pip install chainlit openai lancedb sentence-transformers networkx \
pyarrow leidenalg igraph rank-bm25 mcp ijson pyyaml \
tree-sitter tree-sitter-haskell tree-sitter-rust
Step 1 — Prepare workspace
mkdir -p ~/projects/workspaces/YOUR_ORG/{source,artifacts,output}
cp path/to/your/repos ~/projects/workspaces/YOUR_ORG/source/
cp config.example.yaml ~/projects/workspaces/YOUR_ORG/config.yaml
Step 2 — Choose embedding provider
# Local GPU (no API cost)
EMBED_MODEL=/path/to/model python3 serve/embed_server.py
# Cloud (any, no GPU needed)
EMBED_PROVIDER=openai OPENAI_API_KEY=sk-... python3 serve/embed_server.py
EMBED_PROVIDER=voyage VOYAGE_API_KEY=... python3 serve/embed_server.py
EMBED_PROVIDER=cohere COHERE_API_KEY=... python3 serve/embed_server.py
EMBED_PROVIDER=jina JINA_API_KEY=... python3 serve/embed_server.py
EMBED_PROVIDER=ollama EMBED_PROVIDER_MODEL=nomic-embed-text python3 serve/embed_server.py
Step 3 — Build the index
export REPO_ROOT=~/projects/workspaces/YOUR_ORG/source
export OUTPUT_DIR=~/projects/workspaces/YOUR_ORG/output
export ARTIFACT_DIR=~/projects/workspaces/YOUR_ORG/artifacts
bash build/run_pipeline.sh # 30 min – 2 h depending on codebase size
Step 4 — Start the servers
python3 serve/embed_server.py # start first — other servers share it
ARTIFACT_DIR=~/projects/workspaces/YOUR_ORG/artifacts python3 serve/mcp_server.py
Add .mcp.json to your project and your AI assistant has all 15 tools.
Language support
| Language | Symbols | Call graph | Guard | Co-change |
|---|---|---|---|---|
| Python | ✓ | ✓ | ✓ | ✓ |
| Haskell | ✓ | ✓ (approx) | ✓ | ✓ |
| Rust | ✓ | ✓ | ✓ | ✓ |
| JavaScript/TypeScript | ✓ | ✓ | ✓ | ✓ |
| Go | ✓ | ✓ | ✓ | ✓ |
| Groovy | ✓ | — | ✓ | ✓ |
| Java | ✓ | ✓ | — | ✓ |
Guardian Mode — CI/CD
# PR completeness analysis from the command line
git diff main...HEAD --name-only | python3 serve/pr_analyzer.py
# Zero-config guardian on any repo (no GPU, no AST)
python3 apps/cli/guardian_init.py --repo /path/to/repo
Copy .github/workflows/guardian-lite.yml to any repo for automatic PR risk scoring.
Architecture
embed_server.py (:8001) — loads embedding model once; all servers connect to it
mcp_server.py (:8002) — 28 MCP tools over SSE
demo_server_v6.py (:8000) — Chainlit chat UI (optional)
retrieval_engine.py — core: all indexes, all retrieval logic, imported by everything
Indexes built once, loaded at startup: symbol graph, vector store (LanceDB), co-change, cross-repo co-change, Granger causality, activity metrics, ownership, guardrail docs.
TurboQuant: optional 7.7x vector compression at 3-bit (312MB vs 1.5GB), recall@10 preserved at 0.91. Set QUANT_BITS=3 at build time to deploy on laptops.
Troubleshooting
| Symptom | Fix |
|---|---|
Embed server shows device=cpu |
Check nvidia-smi and CUDA |
| Semantic search misses domain terms | Add short acronyms to kw_allowlist in config |
| MCP tools missing in IDE | Verify type: sse in .mcp.json, check port 8002 |
| LanceDB write fails on WSL2 | Write to /home/, not /mnt/d/ (ext4 only) |
| Co-change builder OOM | Use 06_build_cochange.py — streams at O(1) memory |
Research
Ripple's temporal signals thesis was validated on a 94K-symbol, 12-service production codebase (113,916 commits):
- Cross-repo co-change: signal is real (p < 10⁻¹³), orthogonal to import graph (0.54% overlap), 1.91× weight when import edge present
- Change prediction model: activity features dominate (79–84% importance); structural features add near-zero at short horizons (K=3)
- Cross-ecosystem: activity dominance holds on Flask (Python) as on Haskell — 84% activity importance
Full artifacts: ~/lab/experiments/ · Active threads: ~/lab/OPEN_QUESTIONS.md
Self-hosted. Your code never leaves your machines.
Built by Amit Shukla · Research by Carlsbert — an autonomous Claude agent
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ripple_mcp-0.6.0-py3-none-any.whl.
File metadata
- Download URL: ripple_mcp-0.6.0-py3-none-any.whl
- Upload date:
- Size: 318.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
778e14a1212bf16f89e29494e2cca94a2e10888dd2f7b119e2120c2214013b0e
|
|
| MD5 |
7ab39b3647a199d986f05f0243146480
|
|
| BLAKE2b-256 |
b49bdef7de70f7c085693e2460143a6da6e78d58f669210f3fdb239a58257940
|