AI-powered repository assistant — ask questions about any codebase
Project description
RepoAsk
Ask questions about any codebase using AI. RepoAsk indexes your repository with a RAG pipeline — chunking code by AST symbols, embedding them locally, and answering questions with citations to exact file paths and line numbers.
No server, no database setup. Everything is stored as local files inside your project.
Requirements
- Python 3.10+
- An API key for at least one LLM provider (Groq, OpenAI, or Anthropic)
Installation
RepoAsk is a CLI tool — install it with pipx so it gets its own isolated environment and is available system-wide.
# Install pipx if you don't have it
brew install pipx
pipx ensurepath
# Install repoask
pipx install repoask
macOS / Homebrew users: Do not use
pip installdirectly — Homebrew manages its Python environment and will block it.pipxis the correct tool for installing Python CLI applications.
Alternative — manual venv (for development or local builds):
python3 -m venv ~/.venvs/repoask
source ~/.venvs/repoask/bin/activate
pip install repoask
Quick Start
1. Configure your providers and API keys (once, globally)
repoask init
This walks you through choosing an embedding provider and an LLM provider, then saves your config to ~/.repoask/config.toml.
2. Index a repository
cd /path/to/your/repo
repoask index
RepoAsk scans all Python, JavaScript, and TypeScript files, extracts symbols (functions, classes, interfaces) using tree-sitter, generates embeddings, and stores everything locally under .repoask/.
3. Ask a question
repoask ask "How does authentication work?"
Or start an interactive session:
repoask chat
Commands
repoask init
Interactive setup wizard. Sets your embedding provider, LLM provider, models, and API keys. Saves to ~/.repoask/config.toml.
repoask index
Scans and indexes the current repository. Only re-indexes files that have changed since the last run (incremental by default).
repoask index # incremental (default)
repoask index --full # force a complete re-index
repoask ask "<question>"
One-shot question. Retrieves relevant code chunks, builds context, and streams an answer.
repoask ask "Where is the database connection initialized?"
repoask ask "What does the UserService class do?"
repoask ask "How are API errors handled?" --top-k 15
repoask ask "Show me all route handlers" --lang typescript
Options:
--top-k N— number of chunks to retrieve (default: 10)--lang LANG— filter results to a specific language (python,javascript,typescript)
repoask chat
Interactive multi-turn session. Maintains conversation context within the session and persists history to disk between sessions.
Commands inside chat:
/clear clear the current session context
/quit exit
Options:
--top-k N— number of chunks to retrieve per turn (default: 10)--no-history— skip persisting this session to disk
repoask status
Shows index statistics for the current repository.
Repository /path/to/your/repo
Index path /path/to/your/repo/.repoask
Embedding huggingface / sentence-transformers/all-MiniLM-L6-v2
LLM groq / llama3-8b-8192
Chunks 1,842
Files indexed 74
Last indexed 2026-06-22 14:30:01
repoask config
Displays the current global configuration with API keys masked.
Supported Providers
Embedding
| Provider | Models | Notes |
|---|---|---|
huggingface |
sentence-transformers/all-MiniLM-L6-v2 (default) |
Runs locally, no API key needed |
openai |
text-embedding-3-small, text-embedding-3-large |
Requires OpenAI API key |
LLM
| Provider | Example Models | Notes |
|---|---|---|
groq |
llama-3.1-8b-instant, llama-3.3-70b-versatile |
Fast, free tier available |
openai |
gpt-4o-mini, gpt-4o |
Requires OpenAI API key |
anthropic |
claude-haiku-4-5-20251001, claude-sonnet-4-6 |
Requires Anthropic API key |
ollama |
llama3.2, mistral, codellama |
Fully local, no API key, requires Ollama installed |
Configuration
Config is stored in TOML format. RepoAsk merges two config files in order:
- Global
~/.repoask/config.toml— API keys and provider defaults, shared across all repos - Per-repo
.repoask/config.toml— overrides for a specific project (optional)
Full config reference
[embedding]
provider = "huggingface"
model = "sentence-transformers/all-MiniLM-L6-v2"
api_key = ""
[llm]
provider = "groq"
model = "llama-3.1-8b-instant"
api_key = "gsk_..."
temperature = 0.2
max_tokens = 2048
[indexing]
ignore_patterns = [
".git", "node_modules", "__pycache__", "dist", "build",
".venv", "venv", "*.lock", "*.min.js", "*.min.css",
"*.pyc", ".DS_Store", "coverage", ".next", ".nuxt",
]
max_file_size_kb = 500
languages = ["python", "javascript", "typescript"]
[store]
path = ".repoask"
Local Index Files
When you run repoask index, a .repoask/ directory is created inside your repository:
.repoask/
├── chroma/ # vector store (ChromaDB, file-based)
├── tracker.db # file hash registry for incremental indexing (SQLite)
└── history.db # chat session history (SQLite)
No external database or server is required. Add .repoask/ to your .gitignore to avoid committing the index.
echo ".repoask/" >> .gitignore
How It Works
- Scan — traverses the repository, respecting
.gitignoreand configured ignore patterns - Chunk — parses each file with tree-sitter and extracts symbols (functions, classes, interfaces) as individual chunks instead of fixed token windows
- Embed — converts each chunk to a vector embedding using your configured provider
- Store — saves chunks + embeddings + metadata (file path, symbol name, line numbers) to a local ChromaDB collection
- Retrieve — when you ask a question, it embeds the question and finds the most similar chunks via cosine similarity
- Build context — enriches the top-K hits with import-referenced definitions and assembles a structured context block
- Answer — sends the context + question to your LLM with a prompt that requires citations and disallows hallucination
Tips
- Run
repoask indexafter pulling new changes — it only re-embeds what changed. - Use
--langto narrow answers when you know which part of the stack is relevant. - Increase
--top-kfor broad architectural questions; lower it for specific function lookups. - HuggingFace embedding (
all-MiniLM-L6-v2) is free and fast enough for most repos. Switch totext-embedding-3-smallif you want higher retrieval quality on large codebases. - Groq is the recommended LLM provider for speed and cost — the free tier is sufficient for most usage.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repoask-0.2.0.tar.gz.
File metadata
- Download URL: repoask-0.2.0.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b689624313f38fde01c9228091d2b9b783dd752fdf9de8e35f068214e645b2b
|
|
| MD5 |
a0849f535e9cf09cd63e1da23f5483af
|
|
| BLAKE2b-256 |
2d33fdee56a8d52d3ded82080fd0a6a089864bbe3d4c1fa144ca6d43f546bb56
|
File details
Details for the file repoask-0.2.0-py3-none-any.whl.
File metadata
- Download URL: repoask-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0aa24480068bb206b29be0000caa296075879c6afe3fbbf576821e34d7cb6c14
|
|
| MD5 |
530d9aa29bbcda12bea72f29783e3a42
|
|
| BLAKE2b-256 |
81e3ca460efaa486bd00281a7153968cd6a7fe36becb9999e7c0669f3f7b006d
|