AI-powered code retrieval — index any GitHub repo by URL, search with natural language, get token-budget-aware context packs for LLMs
Project description
RepoMemory
AI-powered code retrieval engine — index any GitHub repo, search with natural language
Point it at any GitHub URL. Get token-budget-aware context packs — ready to paste into any LLM. Free to run, free to deploy.
What is RepoMemory?
When you ask an LLM to fix a bug or trace a feature, it needs the right source files. Pasting the whole codebase wastes the context window. Guessing which files to include misses critical pieces.
RepoMemory solves this with a hybrid retrieval pipeline that runs on any public or private GitHub repo:
repomemory index https://github.com/pallets/flask
repomemory search "Where is request routing handled?"
Query → Task Classification (trace_flow)
→ BM25 Lexical + FAISS Semantic + Fuzzy Path + Symbol search (parallel)
→ Reciprocal Rank Fusion → Top-20 ranked files
→ Dependency-graph expansion → Re-rank with adaptive weights
→ Token-budget packer → Context pack ready to paste into any LLM
→ (optional) Groq AI summary
Install
# Core — CLI + library, uses HuggingFace API for embeddings (no GPU needed)
pip install repomemory
# With local embeddings (~80 MB model download, fully offline)
pip install "repomemory[local]"
# With FastAPI web server
pip install "repomemory[server]"
# With Groq LLM explanations (free API key)
pip install "repomemory[llm]"
# Everything
pip install "repomemory[all]"
Quick Start
CLI
# Index a repo
repomemory index https://github.com/pallets/flask
# Search it
repomemory search "How does request routing work?"
# With AI explanations (free Groq key)
export REPOMEMORY_GROQ_API_KEY=gsk_...
repomemory search "Where is token rotation handled?"
# Adjust result count and token budget
repomemory search "auth middleware" --top-k 10 --budget 4000
# Force a task mode
repomemory search "test coverage for auth" --mode test_lookup
# Private repos
repomemory index https://github.com/myorg/private-repo --token ghp_...
# List indexed repos
repomemory list
# Start the web UI + API server
repomemory serve
Python library
from repomemory import RepoMemory
rm = RepoMemory()
# Index
repo = rm.index("https://github.com/pallets/flask")
# Search
result = rm.search("How does request dispatching work?")
for file in result.context_pack.files:
print(f"{file.path} score={file.relevance_score:.2f}")
print(file.snippets[0].content[:300])
REST API
# Start the server
pip install "repomemory[server]"
repomemory serve
# Index a repo
curl -X POST http://localhost:8000/api/repos \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com/pallets/flask"}'
# Search
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"repo_id": 1, "query": "How does routing work?", "token_budget": 8000}'
Features
| Feature | Details |
|---|---|
| Hybrid search | BM25 lexical + FAISS semantic (384-dim) + fuzzy path + symbol name, fused with Reciprocal Rank Fusion |
| Dependency-graph retrieval | Builds file-level import edges at index time; expands results through related files via BFS |
| Adaptive weight learning | Online SGD learning from user feedback (accept / dismiss / thumbs); falls back to static mode weights |
| Symbol-aware indexing | tree-sitter extracts functions, classes, and methods from Python, JavaScript, and TypeScript |
| 5 Task Modes | bug_fix, trace_flow, test_lookup, config_lookup, general — auto-detected from query or set manually |
| Token-budget packer | Greedy packer respects any token limit (default 8 000 tokens, configurable up to 100 000+) |
| Behavioral memory | Frecency scoring from opened / accepted / thumbs-up actions; boosts relevant files in future queries |
| RAG evaluation | End-to-end pipeline scoring retrieval impact on LLM answer quality (relevance, completeness, faithfulness) |
| Flexible embeddings | Local sentence-transformers (offline) or free HuggingFace Inference API |
| AI explanations | Optional Groq LLM (free tier) explains why each result matters |
| Incremental indexing | SHA-256 per file; only changed files are re-embedded on re-index |
| Web UI + REST API | React 19 frontend + FastAPI backend; deploy on Render + Vercel (both free) |
| Export as Markdown | Copy context pack as a formatted Markdown block to paste directly into an LLM prompt |
Task Modes
RepoMemory classifies each query and adjusts retrieval weights automatically:
| Mode | Auto-detected from | What it boosts |
|---|---|---|
bug_fix |
error, exception, crash, fix, traceback |
Lexical signal, error-adjacent files |
trace_flow |
trace, flow, route, handler, how does...work |
Symbol matching, call-chain ordering |
test_lookup |
test, spec, mock, fixture, coverage |
Path matching for tests/ / spec/ dirs |
config_lookup |
config, env, setting, yaml, toml |
Path matching for config-like files |
general |
(fallback) | Balanced across all signals |
Configuration
All settings use the REPOMEMORY_ env prefix (powered by pydantic-settings):
export REPOMEMORY_HF_API_KEY=hf_... # HuggingFace free API key
export REPOMEMORY_GROQ_API_KEY=gsk_... # Groq free API key (for AI summaries)
export REPOMEMORY_EMBEDDING_PROVIDER=local # 'local' or 'huggingface'
export REPOMEMORY_DATA_DIR=/data/repomemory # where SQLite + FAISS live
export REPOMEMORY_TOKEN_BUDGET=16000 # default context pack size
Links
- GitHub — github.com/aayushakumar/RepoMemory (full docs, architecture diagram, contributing guide)
- Issues — github.com/aayushakumar/RepoMemory/issues
- Changelog — github.com/aayushakumar/RepoMemory/releases
- API Docs (when running locally) —
http://localhost:8000/docs
License
MIT © Aayush Kumar
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repomemory-0.2.3.tar.gz.
File metadata
- Download URL: repomemory-0.2.3.tar.gz
- Upload date:
- Size: 55.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b865c0db82747e700c14278b15b82bac6934c925f85d31ad11cab1a21eaaf730
|
|
| MD5 |
b37c355a5fd88b5a7fab872e3a7ccc82
|
|
| BLAKE2b-256 |
7a5dd1c81d79c5a3f4b104437f71bb1620aaad45cb5c39aa1e171ef5d750e58f
|
Provenance
The following attestation bundles were made for repomemory-0.2.3.tar.gz:
Publisher:
publish.yml on aayushakumar/RepoMemory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
repomemory-0.2.3.tar.gz -
Subject digest:
b865c0db82747e700c14278b15b82bac6934c925f85d31ad11cab1a21eaaf730 - Sigstore transparency entry: 1324638824
- Sigstore integration time:
-
Permalink:
aayushakumar/RepoMemory@ba8c0cd51e5ba7568374bc32938450d36f07bd0a -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/aayushakumar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ba8c0cd51e5ba7568374bc32938450d36f07bd0a -
Trigger Event:
release
-
Statement type:
File details
Details for the file repomemory-0.2.3-py3-none-any.whl.
File metadata
- Download URL: repomemory-0.2.3-py3-none-any.whl
- Upload date:
- Size: 58.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83139ab6ad2461d10c8c77448c8d60bd5af43d7b974ce41fc9ec747b98f916ac
|
|
| MD5 |
e564a4cafab6f5aa27354db9e96675f1
|
|
| BLAKE2b-256 |
da4d7327f1fa9fa09d8ec4a6a1cdafc7e62d672cd119ac81ec198c01d79a26f6
|
Provenance
The following attestation bundles were made for repomemory-0.2.3-py3-none-any.whl:
Publisher:
publish.yml on aayushakumar/RepoMemory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
repomemory-0.2.3-py3-none-any.whl -
Subject digest:
83139ab6ad2461d10c8c77448c8d60bd5af43d7b974ce41fc9ec747b98f916ac - Sigstore transparency entry: 1324638863
- Sigstore integration time:
-
Permalink:
aayushakumar/RepoMemory@ba8c0cd51e5ba7568374bc32938450d36f07bd0a -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/aayushakumar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ba8c0cd51e5ba7568374bc32938450d36f07bd0a -
Trigger Event:
release
-
Statement type: