Vectorless RAG for Code Repositories - Navigate your codebase with LLM reasoning
Project description
๐ฒ CodeTree
Vectorless RAG for Code Repositories
Navigate your codebase like a human expert โ using LLM reasoning, not vector similarity.
๐ค The Problem
Traditional RAG (Retrieval-Augmented Generation) for code has fundamental limitations:
| Problem | Description |
|---|---|
| โ Vector similarity โ Code relevance | "login" and "logout" have similar embeddings, but they're completely different! |
| โ Chunking destroys structure | Splitting a class across chunks loses critical context |
| โ Can't follow call chains | "Who calls this function?" is nearly impossible with vectors |
| โ No architecture understanding | Vectors don't know that auth/ is for authentication |
๐ก The Solution
CodeTree takes a different approach โ it builds a hierarchical tree index of your codebase and uses LLM reasoning to navigate it, just like a human developer would:
- โ AST-based parsing preserves code structure
- โ LLM reasons about which files are relevant
- โ Understands module relationships and dependencies
- โ Can trace function calls across files
โจ Features
| Feature | Description |
|---|---|
| ๐ซ No Vector Database | Uses code structure + LLM reasoning instead of embedding similarity |
| ๐ณ AST-Based Indexing | Parses actual code structure โ functions, classes, imports, dependencies |
| ๐ Cross-File Intelligence | Tracks imports, function calls, and dependencies across your entire codebase |
| ๐ง Reasoning-Based Retrieval | LLM navigates the code tree like a human expert |
| ๐ฌ Natural Language Queries | Ask questions in plain English |
| ๐ Privacy-First | Works with local models (Ollama). Your code never leaves your machine |
๐ Comparison: Vector RAG vs CodeTree
| Feature | Vector RAG | CodeTree |
|---|---|---|
| Understands code structure | โ | โ |
| Cross-file references | โ | โ |
| "Who calls this function?" | โ | โ |
| No chunking headaches | โ | โ |
| Explainable retrieval | โ | โ |
| Works offline | โ ๏ธ | โ |
| No vector DB needed | โ | โ |
๐ Quick Start
Installation
git clone https://github.com/toller892/Oh-Code-Rag.git
cd Oh-Code-Rag
pip install -e .
Configuration
Set your LLM API key:
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."
Basic Usage
from codetree import CodeTree
# Index your repository
tree = CodeTree("/path/to/your/repo")
tree.build_index()
# Ask questions about the code
answer = tree.query("How does the authentication system work?")
print(answer)
CLI Usage
# Index a repository
codetree index /path/to/repo
# Query the codebase
codetree query "Where is database connection handled?"
# Interactive chat mode
codetree chat
# Show code structure
codetree tree
# Find symbol references
codetree find "UserService"
๐ฏ Use Cases
๐จโ๐ป For Developers
Onboarding to New Codebases:
- "What's the overall architecture of this project?"
- "How do requests flow from API to database?"
- "Where should I add a new payment method?"
Code Review & Understanding:
- "What does the processOrder function do?"
- "Who calls the validateUser method?"
- "What happens if authentication fails?"
๐ข Industry Applications
| Industry | Use Case | Example Query |
|---|---|---|
| FinTech | Audit & Compliance | "How is user data encrypted?" |
| Healthcare | Security Review | "Where is patient data accessed?" |
| E-commerce | Feature Development | "How does the cart system work?" |
| DevOps | Incident Response | "What services depend on Redis?" |
| Education | Code Learning | "Explain the MVC pattern in this app" |
๐ฌ Research & Analysis
- Legacy Code Migration: Understand old systems before rewriting
- Security Auditing: Find all database queries, API endpoints
- Documentation Generation: Auto-generate architecture docs
- Dependency Analysis: Map out service dependencies
๐ฌ Real-World Examples
Example 1: Understanding Project Architecture
Query:
from codetree import CodeTree
tree = CodeTree("./my-project")
tree.build_index()
answer = tree.query("What's the overall architecture? What are the core modules?")
print(answer)
Output:
## Project Architecture
This project follows a modular architecture with these core components:
1. **CodeTree (core.py)** - Main entry point
- `build_index()`: Builds the code tree
- `query()`: Natural language queries
- `find()`: Symbol search
2. **CodeIndexer (indexer.py)** - Index construction
- Recursively parses directories
- Builds TreeNode hierarchy
3. **CodeParser (parser.py)** - AST parsing
- Supports Python, JS, Go, Rust, Java
- Extracts functions, classes, imports
4. **CodeRetriever (retriever.py)** - LLM-based retrieval
- Two-stage: retrieve โ answer
- Uses reasoning prompts
## Data Flow
User Query โ CodeTree โ Retriever โ LLM Reasoning โ File Selection โ Answer
Example 2: Finding Function Usage
Query:
refs = tree.find("authenticate")
print(refs)
Output:
๐ Found 5 references to 'authenticate':
[function] src/auth/login.py:45 โ authenticate
[function] src/auth/oauth.py:78 โ authenticate_oauth
[import] src/api/middleware.py โ from auth import authenticate
[import] src/api/routes.py โ from auth.login import authenticate
[class] src/auth/base.py:12 โ Authenticator
Example 3: Tracing Code Flow
Query:
answer = tree.query("How does a user login request flow through the system?")
print(answer)
Output:
## Login Request Flow
1. **Entry Point**: `src/api/routes.py`
- @app.post("/login") routes to auth_service.authenticate()
2. **Authentication**: `src/auth/service.py`
- Validates credentials against database
- Generates JWT token on success
3. **Database**: `src/db/users.py`
- get_user_by_email() fetches user record
- verify_password() checks hash
4. **Response**: Returns JWT token or 401 error
๐๏ธ How It Works
Architecture Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CodeTree โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ CodeParser โโโโโโโถ CodeIndexer โโโโโโโถ CodeIndex (JSON) โ
โ (AST Parse) (Build Tree) (Store) โ
โ โ โ
โ โผ โ
โ Answer โโโโโโโโโโโ Retrieve โโโโโโโโโโโ CodeRetriever โ
โ (Markdown) (Read Files) (LLM Reasoning) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Two-Stage Retrieval Process
Stage 1: Reasoning-Based Navigation
User: "How does authentication work?"
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LLM analyzes code tree structure: โ
โ โ
โ "Authentication relates to auth module... โ
โ Let me check src/auth/ directory... โ
โ login.py and oauth.py look relevant... โ
โ Also need to check who imports these..." โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Selected Files: [src/auth/login.py, src/auth/oauth.py, ...]
Stage 2: Answer Generation
Read selected files โ Generate comprehensive answer with code snippets
๐ฃ๏ธ Supported Languages
| Language | Extensions | Status |
|---|---|---|
| Python | .py, .pyi |
โ Full |
| JavaScript | .js, .jsx, .mjs |
โ Full |
| TypeScript | .ts, .tsx |
โ Full |
| Go | .go |
โ Full |
| Rust | .rs |
โ Full |
| Java | .java |
โ Full |
| C/C++ | .c, .cpp, .h |
๐ง Coming Soon |
โ๏ธ Configuration
Create .codetree.yaml in your project:
# LLM Configuration
llm:
provider: openai # openai, anthropic, ollama
model: gpt-4o
temperature: 0.0
max_tokens: 4096
# For local/private deployment
# llm:
# provider: ollama
# model: llama3
# base_url: http://localhost:11434
# Index Settings
index:
languages:
- python
- javascript
- typescript
- go
exclude:
- node_modules
- __pycache__
- .git
- venv
- dist
max_file_size: 100000 # Skip files larger than 100KB
๐ Performance
| Metric | Small Repo (<100 files) | Medium Repo (<1000 files) | Large Repo (<10000 files) |
|---|---|---|---|
| Index Time | < 5s | < 30s | < 5min |
| Index Size | < 100KB | < 1MB | < 10MB |
| Query Time | 2-5s | 3-8s | 5-15s |
Times depend on LLM provider latency
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas to contribute:
- ๐ Add language parsers (C++, Ruby, PHP, etc.)
- ๐งช Improve test coverage
- ๐ Documentation and examples
- ๐ Performance optimizations
- ๐จ CLI improvements
๐ MCP Server (Claude Desktop & More)
CodeTree works as an MCP (Model Context Protocol) server, compatible with Claude Desktop, Cline, Continue, and other MCP clients.
Setup for Claude Desktop
Add to your Claude Desktop config:
{
"mcpServers": {
"codetree": {
"command": "python",
"args": ["/path/to/Oh-Code-Rag/mcp/server.py"],
"env": {
"OPENAI_API_KEY": "sk-your-key-here"
}
}
}
}
MCP Tools
| Tool | Description |
|---|---|
codetree_index |
Index a repository |
codetree_query |
Ask questions about code |
codetree_tree |
Show code structure |
codetree_find |
Find symbol references |
codetree_stats |
Get repo statistics |
See mcp/README.md for full documentation.
๐ค Clawdbot Skill
CodeTree also comes as a Clawdbot skill for AI assistant integration.
Install Skill
Copy the skill/ folder to your Clawdbot skills directory:
cp -r skill/ ~/.clawdbot/skills/codetree/
Skill Commands
# Index a repo
./scripts/codetree.sh index /path/to/repo
# Query code
./scripts/codetree.sh query /path/to/repo "How does auth work?"
# Show structure
./scripts/codetree.sh tree /path/to/repo
# Find symbol
./scripts/codetree.sh find /path/to/repo "UserService"
See skill/SKILL.md for full documentation.
๐ License
MIT License - see LICENSE for details.
๐ Acknowledgments
Inspired by PageIndex โ vectorless RAG for documents.
โญ Star History
If you find CodeTree useful, please give us a โญ!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codetree_rag-0.1.0.tar.gz.
File metadata
- Download URL: codetree_rag-0.1.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5330508b9ae74473c19b74163bb876ba6429af52fba4facf70a37a35db05979
|
|
| MD5 |
53ec40493b6b5c3d0bbda9aa34be24e3
|
|
| BLAKE2b-256 |
e648cb72f4c31a4a76fcf4eb501e8900556cb2d04f9143d52ec3741efd6642d3
|
File details
Details for the file codetree_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codetree_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ad47ebbee32dde268a26551cf197f0101203cbf9cf914121bddf24511115155
|
|
| MD5 |
fee01082cfa649ef395715be7c691587
|
|
| BLAKE2b-256 |
99210bca13b2b27a1bf3a27561613b2dc79b8453ddc331f1a325c044773a53e8
|