Chat with any codebase using AI — local or GitHub repos
Project description
askmy-codebase
Chat with any codebase using AI — from your terminal or via REST API. Point it at a local folder or a GitHub URL, ask questions in plain English, and get answers grounded in the actual source code.
Key Features
- Terminal CLI —
askmy-codebase --repo_path .works from any directory after a one-time install - GitHub URL support — clones and indexes any public repo on the fly
- Hybrid search — combines FAISS vector search with BM25 keyword search for better retrieval
- PR review mode — feed it a
.difffile and get a structured code review - CLAUDE.md generator — auto-generate a codebase summary file for any repo
- REST API — run it as a FastAPI server for programmatic access
- Incremental indexing — only re-embeds files that changed since the last run
- No cloud embedding — uses
BAAI/bge-small-en-v1.5locally (free, private)
Table of Contents
- Tech Stack
- Prerequisites
- Installation
- Quick Start
- Usage
- Environment Variables
- REST API
- Deployment on Render
- Architecture
- Project Structure
- Running Tests
- Troubleshooting
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.9+ |
| LLM | OpenAI GPT (default: gpt-4.1-nano-2025-04-14) |
| Embeddings | HuggingFace BAAI/bge-small-en-v1.5 (local) |
| Vector Store | FAISS (CPU) |
| Keyword Search | BM25 (rank-bm25) |
| Code Parsing | tree-sitter (Python, JS) |
| Orchestration | LangChain |
| REST API | FastAPI + Uvicorn |
| Deployment | Docker / Render |
Prerequisites
- Python 3.9 or higher
- An OpenAI API key
- Git (for cloning GitHub repos)
Installation
Option 1 — Install from PyPI (recommended)
pip install askmy-codebase
Option 2 — Install from source
git clone https://github.com/Nachiket1904/askmy-codebase.git
cd askmy-codebase
pip install -e .
Save your API key (one time only)
askmy-codebase configure --api-key sk-xxxxx
This saves the key to ~/.config/askmy-codebase/config.json so you never need a .env file. The key is loaded automatically on every run.
Alternative: Set
OPENAI_API_KEYas an environment variable or add it to a.envfile in your working directory.
Quick Start
# Chat with the current directory
askmy-codebase --repo_path .
# Chat with a GitHub repo (clones automatically)
askmy-codebase --repo_path https://github.com/username/reponame
First run downloads the embedding model (~130 MB) and builds the index. Subsequent runs reuse the index and only re-embed changed files.
Usage
Chat mode (default)
askmy-codebase --repo_path .
[1/4] Indexing codebase from: /your/project
Index saved to ./index/abc123/
[2/4] Building repository map...
Mapped 12 file(s)
[3/4] Loading retrieval chain...
Ready.
[4/4] Starting chat session.
Ask questions about the codebase. Type 'exit' to quit.
> How does authentication work?
The authentication flow uses JWT tokens issued at login...
Sources: src/auth.py, src/middleware.py
> exit
Bye.
PR review mode
# Generate a diff first
git diff main...my-branch > changes.diff
# Review it
askmy-codebase --repo_path . --mode pr-review --diff changes.diff
Outputs a JSON object with a summary, file-by-file feedback, and a risk score.
Generate CLAUDE.md
Creates a structured CLAUDE.md context file for the repo — useful for AI assistants like Claude Code.
askmy-codebase --repo_path . --mode generate-claude-md
All flags
| Flag | Default | Description |
|---|---|---|
--repo_path |
required | Local path or GitHub URL |
--index_path |
./index |
Where to store/load the FAISS index |
--model |
gpt-4.1-nano-2025-04-14 |
OpenAI chat model to use |
--mode |
chat |
chat, pr-review, or generate-claude-md |
--diff |
— | Path to .diff file (required for pr-review) |
--rebuild-index |
false | Force re-index even if index exists |
Environment Variables
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | Yes (or use configure) |
OPENAI_CHAT_MODEL |
Override the default chat model | No |
API_SECRET_KEY |
Secret key for REST API auth | No (dev mode if unset) |
REPO_PATH |
Pre-load a repo at API server startup | No |
INDEX_PATH |
Base directory for FAISS indexes | No (default: ./index) |
REST API
Run as an API server for programmatic access:
uvicorn src.api:app --host 0.0.0.0 --port 8000
Interactive docs available at http://localhost:8000/docs.
Endpoints
POST /index
Index a repository (required before querying).
curl -X POST http://localhost:8000/index \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret" \
-d '{"repo_path": "https://github.com/username/repo", "rebuild": false}'
{ "status": "ok", "files_indexed": 24 }
POST /query
Ask a question about the indexed codebase.
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret" \
-d '{"question": "How does the ingestion pipeline work?"}'
{
"answer": "The ingestion pipeline loads source files using GenericLoader...",
"sources": ["src/ingestion.py", "src/embedder.py"]
}
POST /review
Review a diff string.
curl -X POST http://localhost:8000/review \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret" \
-d '{"diff": "--- a/src/api.py\n+++ b/src/api.py\n..."}'
Deployment on Render
The repo includes a render.yaml and Dockerfile for one-click deployment.
- Fork/push this repo to GitHub
- Go to render.com → New Web Service → connect your repo
- Render detects
render.yamland configures automatically - Set these environment variables in the Render dashboard:
OPENAI_API_KEY— your OpenAI keyAPI_SECRET_KEY— a random secret for API auth
- Deploy
Note: The free tier uses ephemeral storage (
/tmp/index). The index is rebuilt after each redeploy. For persistence, attach a Render Disk and setINDEX_PATHto a persistent path.
Calling the deployed API
# Index a repo
curl -X POST https://your-app.onrender.com/index \
-H "X-API-Key: your-secret" \
-H "Content-Type: application/json" \
-d '{"repo_path": "https://github.com/username/repo"}'
# Query it
curl -X POST https://your-app.onrender.com/query \
-H "X-API-Key: your-secret" \
-H "Content-Type: application/json" \
-d '{"question": "What does this project do?"}'
Architecture
repo_path (local or GitHub URL)
│
▼
[github_loader] ← clones GitHub URLs to temp dir
│
▼
[ingestion] ← loads .py/.js/.ts/.java files, respects .claudeignore
│
▼
[embedder] ← splits by language, embeds with BAAI/bge-small-en-v1.5
│ ← caches embeddings on disk (incremental re-indexing)
▼
[FAISS index] + [BM25 index]
│ │
└──────┬──────────┘
▼
[retriever] ← hybrid search: vector + keyword, re-ranked
│
▼
[LangChain chain] ← ConversationalRetrievalChain with GPT
│
▼
answer + sources
How hybrid retrieval works
For each query, two retrievers run in parallel:
- FAISS finds semantically similar chunks (meaning-based)
- BM25 finds chunks with matching keywords (exact-match)
Results are merged and de-duplicated. This handles both conceptual questions ("how does auth work?") and exact lookups ("find where load_codebase is called").
Index isolation
Each repo gets its own index directory keyed by a SHA-256 hash of the repo path. Running against two different repos never overwrites each other's index.
./index/
a3f7c12b4e/ ← hash of /projects/repo-a
9d2e1f8c03/ ← hash of /projects/repo-b
Project Structure
askmy-codebase/
├── src/
│ ├── main.py # CLI entry point, all modes
│ ├── api.py # FastAPI REST server
│ ├── ingestion.py # File loading, .claudeignore support
│ ├── embedder.py # FAISS index build/save/load, incremental
│ ├── retriever.py # Hybrid BM25+FAISS retrieval, LangChain chain
│ ├── github_loader.py # Clone GitHub URLs to temp dir
│ ├── ast_parser.py # tree-sitter repo map (functions, classes)
│ ├── pr_reviewer.py # Diff review pipeline
│ ├── claude_md_generator.py # CLAUDE.md generation
│ └── context_builder.py # Load/save CLAUDE.md context
├── tests/ # pytest test suite (21 tests)
├── .github/
│ └── workflows/
│ └── publish.yml # Auto-publish to PyPI on GitHub release
├── Dockerfile # Docker build (pre-downloads embedding model)
├── render.yaml # Render deployment config
├── setup.py # Package entry point
├── pyproject.toml # Build metadata
└── requirements.txt # All dependencies
Running Tests
pip install pytest
pytest tests/ -v
Expected: 21 tests passing.
# Run a specific file
pytest tests/test_ingestion.py -v
pytest tests/test_retriever.py -v
Troubleshooting
IndexError: list index out of range on /index
The repo has no supported source files (.py, .js, .ts, .java). Check that repo_path points to a repo with code in those languages, or that the GitHub URL is correct and the repo is public.
OPENAI_API_KEY is not set
Run askmy-codebase configure --api-key sk-xxxxx or export the variable:
export OPENAI_API_KEY=sk-xxxxx
First run is slow
The embedding model (BAAI/bge-small-en-v1.5, ~130 MB) downloads on first use and is cached in ~/.cache/huggingface/. Subsequent runs are fast.
Render free tier — index lost after redeploy
Free tier uses ephemeral storage. The index rebuilds on every deploy. To persist it, add a Render Disk and set INDEX_PATH=/data/index in your environment variables.
API returns 503 "Index not loaded"
Call POST /index first to build the index before calling /query or /review.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askmy_codebase-0.1.0.tar.gz.
File metadata
- Download URL: askmy_codebase-0.1.0.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a45db3dc567919cfc3252fa8efa698fe63856eeb8ca975dde91ec96101f05ffb
|
|
| MD5 |
64e0443f23cb986e27d6d41608cceceb
|
|
| BLAKE2b-256 |
053a42f0f2a6eaefc51234019ce463b0a76c0a89be8b9e14ce7a581665c3e429
|
File details
Details for the file askmy_codebase-0.1.0-py3-none-any.whl.
File metadata
- Download URL: askmy_codebase-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c9dbab3f0d6189ac8d2aac0a907fdf802cebd86320a92e66f36a2940d884729
|
|
| MD5 |
2b7635f2bc907f678da164fa458fe521
|
|
| BLAKE2b-256 |
603bc44daf5587bdf1999afb10b408a8c575a9f613c4af8a8b9f07e89c5ad2e8
|