Skip to main content

Local-first codebase context engine — ask plain English questions about any Python codebase

Project description

repolix

PyPI version

Ask plain English questions about any Python, JavaScript, or TypeScript codebase. Get answers with exact file and line citations. Runs entirely on your machine.

Preview

Repolix preview

$ repolix index ./myrepo
Indexing /path/to/myrepo
Indexing  100% ████████████████████ 24/24
╭──────── Index Complete ─────────╮
│ Files found:    24              │
│ Files indexed:  22              │
│ Files skipped:  2 (unchanged)   │
│ Chunks stored:  183             │
╰─────────────────────────────────╯

$ repolix query "how does authentication work"
Searching...
Generating answer...
╭──────────────────────── Answer ──────────────────────────╮
│ authenticate_user() in auth/validators.py validates       │
│ credentials by calling validate_token() [1], which checks │
│ expiry and signature. On success it creates a session via │
│ SessionService.create() [2].                              │
╰───────────────────────────────────────────────────────────╯
──────────────────────── Citations ────────────────────────
  [1] auth/validators.py:14-28  (validate_token)
  [2] auth/session.py:45-67     (SessionService.create)

confidence: high

Your code never leaves your machine. No server. No accounts beyond an OpenAI API key.


Quickstart

Requirements

Node.js is not required for end users. The web UI is pre-built and bundled inside the package.

Install

pip install repolix

Set your API key

export OPENAI_API_KEY=sk-your-key-here
# or add it to a .env file in your working directory

Index a repo

repolix index ./path/to/repo

Ask a question

repolix query "how does authentication work"

# Raw chunks without LLM (useful for debugging retrieval)
repolix query "where is UserService defined" --no-llm

# Force re-index all files, not just changed ones
repolix index ./path/to/repo --force

Web UI

uvicorn repolix.api:app --port 8000
# Open http://localhost:8000

Why repolix

Getting dropped into an unfamiliar codebase is painful. Documentation is outdated. Grep finds strings, not meaning. LLM chatbots hallucinate file names and function signatures because they have no access to your actual code.

repolix indexes your code locally using Tree-sitter AST chunking for .py, .ts, .tsx, .js, and .jsx — every retrieved chunk is a complete function or class, never an arbitrary line slice. It runs entirely on your machine.


How it works

1. AST chunking Tree-sitter parses each file into a syntax tree (Python and JavaScript/TypeScript grammars). repolix splits only at function and class boundaries — every chunk is semantically complete. Methods are tracked with their parent class for disambiguation.

2. Hybrid search Queries run against OpenAI embeddings (vector search) and exact token matching (keyword search) simultaneously. Results are merged using Reciprocal Rank Fusion, a ranking algorithm that rewards consistency across search methods over dominance in just one.

3. Call graph expansion After initial retrieval, repolix inspects each chunk's call graph and fetches called functions that didn't rank highly on their own. This surfaces implementation details that live one function call away from the entry point.

4. Metadata re-ranking Retrieved chunks are re-ranked using function names, file paths, docstrings, and call graph signals before being sent to the LLM.

5. Cited answers The top chunks go to the LLM with instructions to answer directly and cite every claim. Citations map back to exact file paths and line numbers.


Output format

Each query produces:

  • A prose answer with inline citations [1], [2], etc.
  • A citations section with exact file paths and line ranges. Citations marked [truncated] mean the function exceeded the 300-token chunk cap.
  • A confidence label (high / medium / low) based on how strongly the retrieved chunks matched the query across function names, file paths, docstrings, and call graph signals.

Cost

Action Approximate cost
Index a 30k-line repo ~$0.02 (one-time)
Re-index after a small change ~$0.001 (changed files only)
Each query ~$0.001

Incremental indexing means only changed files are re-embedded on subsequent runs.


Stack

Layer Choice
AST parsing Tree-sitter
Embeddings text-embedding-3-small
Vector store ChromaDB (local, no server needed)
LLM gpt-5.4-mini
Backend FastAPI
Frontend React + TypeScript
CLI Click + Rich

Install from source

git clone https://github.com/TheAsianFish/repolix
cd repolix
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

For frontend development (requires Node.js 18+):

cd frontend && npm install && cd ..
bash start.sh
# Backend: http://localhost:8000  |  Frontend: http://localhost:3000

Limitations

  • JSDoc is not extracted into chunk text yet (JavaScript/TypeScript chunks use source and identifiers only)
  • Best on repos up to ~30k lines
  • Deeply nested functions are included in their parent chunk
  • Large functions (>300 tokens) are truncated at the chunk cap
  • Complex cross-file reasoning may require rephrasing the query

Roadmap

Next in V2

  • repolix tour — proactive orientation briefing for unfamiliar repos
  • repolix trace — call graph traversal for any named function
  • Local model support via Ollama (zero API cost, fully air-gapped)
  • Persistent query sessions across terminal restarts

Considering for V3

  • VS Code extension
  • Multi-repo support

Contributing

Bug reports and pull requests are welcome. Please open an issue before submitting a large change so we can discuss the approach.


License

MIT © 2026 Patrick Chung

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repolix-0.2.0.tar.gz (144.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repolix-0.2.0-py3-none-any.whl (133.0 kB view details)

Uploaded Python 3

File details

Details for the file repolix-0.2.0.tar.gz.

File metadata

  • Download URL: repolix-0.2.0.tar.gz
  • Upload date:
  • Size: 144.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for repolix-0.2.0.tar.gz
Algorithm Hash digest
SHA256 80e2bfe97640ed20e3b38ac2818b199bfed7a8c829c24e4aa6817202388e9f36
MD5 4c700e7760cadc4d41fd8d08fcf8ce8f
BLAKE2b-256 a0742cb8400e6dd842a4f51c5fc0e3e63f1a226f9aabb71311378d4ae0a8e902

See more details on using hashes here.

File details

Details for the file repolix-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: repolix-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 133.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for repolix-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 27628b32487e13560a208cdc801d04156b894271a8f04c388fb4a26d56e66df2
MD5 8f5d98b5105b3968030f102407034426
BLAKE2b-256 d41c2baf52a3b571b40d6a512944ef7a6a23a48682979f3874ede95c1b5b1f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page