Codebase Intelligence AI — chat with any codebase using semantic search and LLMs
Project description
AskRepo
A local-first code intelligence system that indexes source code and documentation into a vector database and answers natural language questions about it using semantic search and an LLM.
Overview
AskRepo parses your codebase at the AST level, assigns each function, class, and file a natural language description using an LLM, stores everything in a local ChromaDB vector store, and lets you query it in plain English.
It supports any public GitHub repository via shallow Git clone, and handles both structured code (Python, JavaScript, TypeScript) and unstructured documents (Markdown, JSON, TOML, YAML, plain text, config files).
Both the indexing descriptions and the query answering use pluggable LLM backends — Groq (cloud) or Ollama (local) — configurable independently of each other.
Retrieval-Augmented Generation (RAG)
AskRepo is built on the RAG pattern — a technique that improves LLM answers by grounding them in retrieved facts rather than relying on the model's training memory alone.
The problem RAG solves: a codebase is too large to send to an LLM in a single prompt, and even if it fit, the model has no specific knowledge of your code. RAG solves this by:
- Indexing — splitting the codebase into small, searchable chunks and storing them in a vector database.
- Retrieval — when a question arrives, finding the chunks most semantically relevant to it.
- Augmented generation — constructing a prompt that contains only those relevant chunks and sending it to the LLM, so the model answers from actual context rather than guessing.
Phase 1: Indexing
Source file
└── Parsed into chunks (one per function / class / file / document)
└── LLM writes a verbal description of each chunk
└── Description is converted to a vector (embedding)
└── Vector + metadata stored in ChromaDB
Each chunk is described in plain English by an LLM before being embedded. This is important: raw code contains a lot of syntactic noise (brackets, keywords, indentation) that degrades embedding quality. A natural language description captures intent, which embeds and retrieves far more accurately.
Phase 2: Retrieval and Generation
User question
└── Question converted to a vector using the same embedding model
└── Cosine similarity search against all stored vectors
└── Top-k most similar chunks retrieved
└── Chunks assembled into a context prompt
└── LLM generates the answer
What is an Embedding?
An embedding is a fixed-size array of floating-point numbers (a vector) that represents the semantic meaning of a piece of text. Text with similar meaning produces vectors that are geometrically close to each other in high-dimensional space.
For example, the phrase "function that hashes a password" and the phrase "bcrypt-based password encryption routine" will produce vectors with a high cosine similarity score — even though they share no words. This is what enables natural language search over code.
AskRepo uses all-MiniLM-L6-v2, a 22M parameter model that produces 384-dimensional vectors. It runs entirely on CPU and takes under a second per query on modern hardware.
Why not just send all the code to the LLM?
| Approach | Problem |
|---|---|
| Send full codebase in prompt | Exceeds context window; expensive; model loses focus in large contexts |
| Keyword search (grep) | Finds exact text matches, misses semantic relationships |
| RAG with embeddings | Retrieves semantically relevant chunks; fits in context; accurate |
How It Works
Source Code / GitHub Repo
│
▼
┌─────────────┐
│ parser.py │ AST extraction (tree-sitter) for .py / .js / .ts / .tsx
│ │ Raw content read for .md / .json / .toml / .yaml / etc.
└──────┬───────┘
│ Structured data: functions, classes, imports, globals
▼
┌─────────────┐
│ chunker.py │ One chunk per function, class, file overview, or document
└──────┬───────┘
│ List of typed chunks with metadata
▼
┌──────────────┐
│ describer.py │ LLM generates a 2-4 sentence verbal description per chunk
│ │ Backend: Ollama (local) or Groq (cloud) — set in config.py
└──────┬────────┘
│ Chunks with `verbal` field populated
▼
┌──────────────┐
│ store.py │ Embeds the verbal description via sentence-transformers
│ │ Stores vectors + metadata in local ChromaDB
└──────┬────────┘
│
◆ Index complete
│
┌──────┴────────────────────────────────────────────────────┐
│ query.py │
│ 1. Embed the user's question │
│ 2. Retrieve top-k chunks via cosine similarity │
│ 3. Build a context prompt from the retrieved chunks │
│ 4. Call LLM (Groq or Ollama) → synthesise answer │
└───────────────────────────────────────────────────────────┘
Key design decisions
- AST over full-file embedding — Each function and class is indexed independently. This gives precise semantic hits instead of retrieving large, diluted file blobs.
- Verbal descriptions as the embedding target — Rather than embedding raw code (which encodes syntax, not intent), an LLM first writes a plain English description of each chunk. That description is what gets embedded. This dramatically improves retrieval relevance.
- Fully local storage — ChromaDB persists all vectors to
./chroma_db/on disk. No cloud vector database, no data leaves the machine (unless you use the Groq backend). - Shallow Git clones — GitHub repositories are fetched with
git clone --depth=1, avoiding API rate limits and keeping clone sizes small. - Lazy model loading — The embedding model is only loaded into memory when a command actually needs it (
query,index). Commands likelistandcountrun instantly without touching the model.
Project Structure
askrepo/
├── main.py CLI entry point — all commands route through here
├── askrepo.bat Windows launcher (run `askrepo` from the project directory)
├── config.py Single source of truth for all settings
├── parser.py AST extraction (Python, JS, TS) + simple file reader
├── chunker.py Splits parsed output into indexable chunks
├── describer.py LLM description generation (Groq / Ollama)
├── store.py ChromaDB wrapper — add, search, count, metadata
├── query.py Query pipeline — retrieve → prompt → LLM → answer
├── github_fetcher.py Git clone / pull for public GitHub repositories
├── requirements.txt
├── .env GROQ_API_KEY goes here
├── chroma_db/ Local vector store (auto-created on first index)
└── repos/ Cached GitHub repository clones
Requirements
- Python 3.10+
- Git (must be in PATH — used for
index-repo) - Ollama with
gemma:2bpulled (if using the Ollama backend) - A Groq API key (if using the Groq backend — free tier available)
Installation
git clone <this-repo>
cd askrepo
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS / Linux
pip install -r requirements.txt
Create a .env file in the project root:
GROQ_API_KEY=your_key_here
If you intend to use only Ollama, the .env file and Groq key are not required.
CLI setup (Windows)
The project includes askrepo.bat. To use askrepo as a command from anywhere, add the project directory to your system PATH, or simply run it from within the project directory:
askrepo query "how does authentication work?"
Alternatively, you can always invoke it directly:
python main.py query "how does authentication work?"
Configuration
All settings are in config.py. Edit this file directly — no CLI flags, no environment variable hunting.
# config.py
# Which LLM generates verbal descriptions during indexing
# "ollama" — local, unlimited, no API key needed (default)
# "groq" — cloud, faster, 100k token/day free tier
DESCRIBER_BACKEND = "ollama"
# Which LLM synthesises answers during queries
# "groq" — cloud, better reasoning quality (default)
# "ollama" — local, unlimited
QUERY_BACKEND = "groq"
# Ollama
OLLAMA_BASE_URL = "http://localhost:11434"
OLLAMA_MODEL = "gemma:2b"
# Groq
GROQ_MODEL = "llama-3.3-70b-versatile"
GROQ_CALL_DELAY = 0.5 # seconds between calls, respects free-tier limits
# Retrieval
TOP_K = 5 # chunks returned per query
# Directories never descended into during indexing
SKIP_DIRS = {"venv", ".venv", "__pycache__", "node_modules", "docs", ...}
Backend matrix
| Use case | DESCRIBER_BACKEND | QUERY_BACKEND |
|---|---|---|
| Default (local index, cloud query) | "ollama" |
"groq" |
| Fully offline | "ollama" |
"ollama" |
| Groq daily limit hit | "ollama" |
"ollama" |
| Fastest indexing (burns tokens) | "groq" |
"groq" |
Usage
Index a local path
askrepo index ./myproject
askrepo index ./src/auth.py
Walks the directory recursively. Skips test files, dependency directories (node_modules, .venv, etc.), and documentation folders (docs/). Accepts a single file or any directory.
Index a GitHub repository
askrepo index-repo fastapi/fastapi
askrepo index-repo https://github.com/psf/requests
askrepo index-repo django/django --branch stable/4.2.x
Performs a shallow clone (--depth=1) into ./repos/<owner>_<repo>/. If the repository is already cached, runs git pull to update it instead of re-cloning.
Note on large repositories — Repos with large
docs/folders (translations, tutorials) will generate hundreds of chunks and exhaust the Groq free-tier token budget quickly. Thedocs/directory is inSKIP_DIRSby default. AdjustSKIP_DIRSinconfig.pyif needed.
Query
askrepo query "how does authentication work?"
askrepo query "what does the Timers class track?"
askrepo query "what python version does this require?"
Runs the full pipeline: embed the question → retrieve top-k chunks → build a context prompt → call the LLM → print the answer.
List the index
askrepo list
Prints a structured breakdown of everything currently indexed, grouped by source:
==============================================================
INDEX BREAKDOWN
==============================================================
Sources : 1
Files : 7
Chunks : 25
--------------------------------------------------------------
Source : aswin-2005/MONOL-Server (7 files | 25 chunks)
--------------------------------------------------------------
auth.py python 12 chunks [file, 11x function] -> generate_challenge, ...
crypt.py python 4 chunks [file, 3x function] -> encrypt_with_aesgcm, ...
entries.py python 5 chunks [file, 4x function] -> add_entry, get_entries, ...
requirements.txt text 1 chunk [document]
...
==============================================================
Count chunks
askrepo count
Prints the total number of indexed chunks. Does not load the embedding model.
Clear the index
askrepo clear
Wipes the ChromaDB collection. Does not delete cached repository clones in ./repos/.
Supported File Types
Structured (AST-parsed)
These files are parsed with tree-sitter. Each function, class, and method becomes its own chunk with extracted metadata (parameters, return type, calls, docstring).
| Extension | Language |
|---|---|
.py |
Python |
.js, .mjs, .cjs |
JavaScript |
.ts |
TypeScript |
.tsx |
TypeScript + JSX |
Simple (raw content)
These files are read as plain text and stored as a single document chunk each.
| Extension / Filename | Label |
|---|---|
.md, .markdown |
markdown |
.txt |
text |
.rst |
restructuredtext |
.json |
json |
.toml |
toml |
.yaml, .yml |
yaml |
.env, .ini, .cfg, .conf |
env / config |
Dockerfile, Makefile |
dockerfile / makefile |
.gitignore, .dockerignore |
gitignore |
Embedding Model
AskRepo uses all-MiniLM-L6-v2 from sentence-transformers.
| Property | Value |
|---|---|
| Parameters | 22.7 million |
| Vector dimensions | 384 |
| Max input tokens | 256 |
| Runs on | CPU (no GPU required) |
| Similarity metric | Cosine similarity |
| Local cache | ~/.cache/huggingface/ |
The model is downloaded once on first use and cached locally. All subsequent runs load from disk — no internet connection required after the initial download.
Why verbal descriptions are embedded, not raw code
Embedding raw source code produces vectors that are heavily influenced by syntax — language keywords, punctuation, indentation — rather than the semantic purpose of the code. An LLM-generated description strips the syntactic noise and represents what the code does in plain language, which maps much more faithfully to the kind of natural language questions users ask.
For example, a question like "how is the user session cleaned up?" will match the description "removes expired sessions from the in-memory store on a timed schedule" far more reliably than it would match the raw Python source of that function.
The model is lazy-loaded: it is only initialised when a command actually needs embeddings (index, query). Commands like list and count are instant.
To change the embedding model, update EMBEDDING_MODEL in config.py. Any model from the sentence-transformers model hub can be used.
Ollama Setup
Install Ollama and pull the model:
# Install from https://ollama.com
ollama pull gemma:2b
ollama serve # Ollama usually auto-starts; only needed if not running
Verify it is reachable:
curl http://localhost:11434/api/tags
The base URL and model name are configurable in config.py under OLLAMA_BASE_URL and OLLAMA_MODEL. Any Ollama-compatible model can be used.
Skipped Files and Directories
The following are automatically excluded during indexing to avoid token waste and retrieval noise:
Directories: venv, .venv, __pycache__, node_modules, dist, build, .git, vendor, third_party, site-packages, docs, doc, documentation, examples, example, benchmarks, bench
File name patterns:
- Prefix:
test_,spec_ - Suffix:
_test.py,_test.js,_test.ts,.test.js,.test.ts,.spec.js,.spec.ts,_spec.rb
All of these are configurable via SKIP_DIRS, SKIP_FILE_PREFIXES, and SKIP_FILE_SUFFIXES in config.py.
Limitations
- Groq free tier — 100,000 tokens per day. Indexing a large repository with many files can exhaust this quickly. Use Ollama for indexing (
DESCRIBER_BACKEND = "ollama") and reserve Groq tokens for queries. - Query quality with small models —
gemma:2bis capable but noticeably weaker thanllama-3.3-70b-versatileon complex reasoning. For best answer quality, use Groq for queries. - No incremental re-indexing — Re-running
indexorindex-repoon an already-indexed path will upsert (overwrite) existing chunks. This is safe but re-runs all LLM description calls. - No cross-collection search — All indexed sources share a single ChromaDB collection. Run
clearif you want to start fresh.
Dependencies
| Package | Purpose |
|---|---|
chromadb |
Local vector database |
sentence-transformers |
Embedding model (all-MiniLM-L6-v2) |
tree-sitter |
AST parsing core |
tree-sitter-python |
Python grammar |
tree-sitter-javascript |
JavaScript grammar |
tree-sitter-typescript |
TypeScript / TSX grammar |
groq |
Groq SDK for cloud LLM calls |
python-dotenv |
.env file loading |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askrepo-1.3.0.tar.gz.
File metadata
- Download URL: askrepo-1.3.0.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2312ec6e0db73d1791080659e9c92d38b63145767b3bc44d321fb272ba25a0c2
|
|
| MD5 |
1e316c3f29bff33e3f405869f046c355
|
|
| BLAKE2b-256 |
76b28bda9412e5baedc94a055946d84e2494cd6c5894a03563e86f1a9b7bf284
|
File details
Details for the file askrepo-1.3.0-py3-none-any.whl.
File metadata
- Download URL: askrepo-1.3.0-py3-none-any.whl
- Upload date:
- Size: 34.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c2c5a75ea96b0de63f682287de1a220d123b56afc27963441921bc612a32221
|
|
| MD5 |
954d1fb0508e38f59a766d650bb3f7b4
|
|
| BLAKE2b-256 |
f288ae695922ae1221525b53a82bf053af15260e0f2acc7e5b5a81631e0bdf77
|