Skip to main content

Self-hosted codebase intelligence platform — graph + vector indexing with MCP tools for IDE-embedded LLMs

Project description

Ripple

License Python MCP Tools Guard

Code intelligence from git history — not static analysis.

Works inside Claude Code · Cursor · GitHub Copilot Agent · OpenHands · Windsurf

Static analysis tells you what could break. Git history tells you what actually breaks together. The gap between those two is where production incidents live — and where Ripple operates.


Try it in 60 seconds — no GPU, no config

Analyze any public repo's blast radius from its commit history:

git clone https://github.com/Amitshukla2308/Index-the-code
cd Index-the-code
pip install -e .
python3 apps/cli/demo.py https://github.com/your-org/your-repo
open hr-demo-report.html

You get an HTML report of the highest-risk files ranked by co-change history. On Flask: src/flask/app.py scores 1120 — the single most-coupled file across 2000 commits. No surprise. Immediately useful.


What it does

Blast Radius — +322% recall over static import graph

Every other tool counts import edges. Ripple counts how often files actually changed together across your git history. The difference: only 14.9% of import neighbors ever co-change — the static graph predicts risk for files that don't need review. Temporal signals catch the 85% the graph misses.

Metric Static (v1) Temporal (v2) Delta
recall@10 0.11 0.47 +322%
MRR 0.08 0.36 +359%

Guard — static semantic checks at 2.4ms/file

AI-generated code passes every review gate because it looks correct. Guard verifies what it claims: checks that comments match the code that follows, that locks aren't released before promised mutations complete, that auth happens before action. Catches the class of bugs where the AI wrote a plausible lie.

# Run on any Python/Haskell/Rust/Go/JS codebase
python3 -m ripple.guard path/to/changed_file.py

Patterns: lock scope, premature release, transaction boundaries, auth-before-action, error swallowing.

15 MCP Tools — plug into any AI coding assistant

One config block gives your entire team's AI assistants access to your codebase's history:

{
  "mcpServers": {
    "ripple": {
      "type": "sse",
      "url": "http://127.0.0.1:8002/sse"
    }
  }
}

Setup guides: Cursor · GitHub Copilot · OpenHands

Tool What it answers
check_my_changes Full PR verdict: blast radius + Guard + risk score + reviewers
get_blast_radius Which files co-change with these? Tiered by confidence.
get_why_context WHY is this code the way it is? Ownership, activity trend, Granger causal direction, anti-patterns.
predict_missing_changes What files are likely missing from this PR?
score_change_risk Composite 0-100 risk score for a changeset
suggest_reviewers Who owns these modules from git history?
check_criticality How critical is this module? (blast + coupling + recency)
get_guardrails What must stay true when touching this module?
list_critical_modules Top-N highest-risk modules in the codebase
fast_search Zero-GPU BM25 keyword search, ~40ms p50. No embed server needed.
search_symbols Semantic + keyword + co-change fusion search
search_modules Find which namespace contains relevant code
get_module All symbols in a module
get_function_body Source code of a function by ID
trace_callers Who calls this? (upstream impact)
trace_callees What does this call? (downstream deps)
get_context Large context block — last resort

Full setup

Prerequisites

python3 --version   # 3.11+
pip install chainlit openai lancedb sentence-transformers networkx \
            pyarrow leidenalg igraph rank-bm25 mcp ijson pyyaml \
            tree-sitter tree-sitter-haskell tree-sitter-rust

Step 1 — Prepare workspace

mkdir -p ~/projects/workspaces/YOUR_ORG/{source,artifacts,output}
cp path/to/your/repos ~/projects/workspaces/YOUR_ORG/source/
cp config.example.yaml ~/projects/workspaces/YOUR_ORG/config.yaml

Step 2 — Choose embedding provider

# Local GPU (no API cost)
EMBED_MODEL=/path/to/model python3 serve/embed_server.py

# Cloud (any, no GPU needed)
EMBED_PROVIDER=openai  OPENAI_API_KEY=sk-...  python3 serve/embed_server.py
EMBED_PROVIDER=voyage  VOYAGE_API_KEY=...     python3 serve/embed_server.py
EMBED_PROVIDER=cohere  COHERE_API_KEY=...     python3 serve/embed_server.py
EMBED_PROVIDER=jina    JINA_API_KEY=...       python3 serve/embed_server.py
EMBED_PROVIDER=ollama  EMBED_PROVIDER_MODEL=nomic-embed-text  python3 serve/embed_server.py

Step 3 — Build the index

export REPO_ROOT=~/projects/workspaces/YOUR_ORG/source
export OUTPUT_DIR=~/projects/workspaces/YOUR_ORG/output
export ARTIFACT_DIR=~/projects/workspaces/YOUR_ORG/artifacts

bash build/run_pipeline.sh   # 30 min – 2 h depending on codebase size

Step 4 — Start the servers

python3 serve/embed_server.py   # start first — other servers share it
ARTIFACT_DIR=~/projects/workspaces/YOUR_ORG/artifacts python3 serve/mcp_server.py

Add .mcp.json to your project and your AI assistant has all 15 tools.


Language support

Language Symbols Call graph Guard Co-change
Python
Haskell ✓ (approx)
Rust
JavaScript/TypeScript
Go
Groovy
Java

Guardian Mode — CI/CD

# PR completeness analysis from the command line
git diff main...HEAD --name-only | python3 serve/pr_analyzer.py

# Zero-config guardian on any repo (no GPU, no AST)
python3 apps/cli/guardian_init.py --repo /path/to/repo

Copy .github/workflows/guardian-lite.yml to any repo for automatic PR risk scoring.


Architecture

embed_server.py (:8001)   — loads embedding model once; all servers connect to it
mcp_server.py   (:8002)   — 28 MCP tools over SSE
demo_server_v6.py (:8000) — Chainlit chat UI (optional)
retrieval_engine.py       — core: all indexes, all retrieval logic, imported by everything

Indexes built once, loaded at startup: symbol graph, vector store (LanceDB), co-change, cross-repo co-change, Granger causality, activity metrics, ownership, guardrail docs.

TurboQuant: optional 7.7x vector compression at 3-bit (312MB vs 1.5GB), recall@10 preserved at 0.91. Set QUANT_BITS=3 at build time to deploy on laptops.


Troubleshooting

Symptom Fix
Embed server shows device=cpu Check nvidia-smi and CUDA
Semantic search misses domain terms Add short acronyms to kw_allowlist in config
MCP tools missing in IDE Verify type: sse in .mcp.json, check port 8002
LanceDB write fails on WSL2 Write to /home/, not /mnt/d/ (ext4 only)
Co-change builder OOM Use 06_build_cochange.py — streams at O(1) memory

Research

Ripple's temporal signals thesis was validated on a 94K-symbol, 12-service production codebase (113,916 commits):

  • Cross-repo co-change: signal is real (p < 10⁻¹³), orthogonal to import graph (0.54% overlap), 1.91× weight when import edge present
  • Change prediction model: activity features dominate (79–84% importance); structural features add near-zero at short horizons (K=3)
  • Cross-ecosystem: activity dominance holds on Flask (Python) as on Haskell — 84% activity importance

Full artifacts: ~/lab/experiments/ · Active threads: ~/lab/OPEN_QUESTIONS.md


Self-hosted. Your code never leaves your machines.

Built by Amit Shukla · Research by Carlsbert — an autonomous Claude agent

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ripple_mcp-0.6.0-py3-none-any.whl (318.7 kB view details)

Uploaded Python 3

File details

Details for the file ripple_mcp-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: ripple_mcp-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 318.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for ripple_mcp-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 778e14a1212bf16f89e29494e2cca94a2e10888dd2f7b119e2120c2214013b0e
MD5 7ab39b3647a199d986f05f0243146480
BLAKE2b-256 b49bdef7de70f7c085693e2460143a6da6e78d58f669210f3fdb239a58257940

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page