A powerful RAG system for querying code repositories using tree-sitter parsing, LanceDB vector storage, and Qwen models
Project description
Qwen RAG - Repository Retrieval Augmented Generation
A powerful RAG system for querying code repositories using tree-sitter parsing, LanceDB vector storage, and Qwen models for embedding and reranking.
🚀 Features
- 🔍 Semantic Code Search: Find code by meaning, not just keywords
- 🌐 Multi-Language Support: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, C#, and more
- 🌳 Tree-sitter Parsing: Intelligent code chunking preserving semantic structure
- ⚡ Function-Level Indexing: Automatically extracts and indexes functions, classes, and methods
- 🤖 Qwen Model Integration: Uses Qwen3-Embedding-4B and Qwen3-Reranker-4B models
- 📍 Precise Location Tracking: File paths, line numbers, and character positions
- 💾 Vector Database: Powered by LanceDB for fast similarity search
- 🖥️ CLI Interface: Easy-to-use command-line tool
- ⚙️ Configurable: Flexible configuration via environment variables or files
- 📦 Multi-Repository Support: Index and search across multiple code repositories
📋 Requirements
- Python 3.9+
- 4GB+ RAM recommended
- Qwen embedding and reranking models accessible via OpenAI-compatible API
- Tree-sitter language parsers (installed automatically)
🛠️ Installation
From PyPI (Recommended)
pip install qwen-rag
From Source
git clone https://github.com/yourusername/QwenRag.git
cd QwenRag
pip install -r requirements.txt
pip install -e .
Verify Installation
qwen-rag --help
# or
python -m code_rag.cli --help
🤖 Model Setup
Qwen RAG works with any OpenAI-compatible API serving Qwen models. Here are the most popular options:
Option 1: LM Studio (Recommended for Beginners)
- Download LM Studio: https://lmstudio.ai/
- Download Models:
- Search and download:
text-embedding-qwen3-embedding-4b - Search and download:
qwen.qwen3-reranker-4b
- Search and download:
- Start Local Server:
- Load the embedding model
- Go to "Local Server" tab
- Start server on
http://localhost:1234
- Configure Qwen RAG: Use default settings (already configured for
localhost:1234)
Option 2: Ollama
# Install Ollama: https://ollama.ai/
ollama pull qwen:embedding # For embeddings
ollama pull qwen:reranker # For reranking
# Start Ollama server
ollama serve
Option 3: vLLM or Other OpenAI-Compatible Servers
# Example with vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-Embedding-4B \
--port 1234
Option 4: Remote API Services
Configure your API endpoint in the configuration file or environment variables.
⚙️ Configuration
Quick Start (Using Defaults)
The system works out-of-the-box with LM Studio running on localhost:1234:
# Index your code repository
qwen-rag index /path/to/your/code
# Search your code
qwen-rag search "authentication function"
Environment Variables Configuration
# API Configuration
export RAG_API_BASE="http://localhost:1234/v1"
export RAG_API_KEY="dummy"
export RAG_EMBEDDING_MODEL="text-embedding-qwen3-embedding-4b"
export RAG_RERANKING_MODEL="qwen.qwen3-reranker-4b"
# Optional: Context Window Sizes
export RAG_EMBEDDING_MAX_TOKENS="8192"
export RAG_RERANKING_MAX_TOKENS="32768"
# Optional: Database and Processing
export RAG_DB_PATH="./rag_db"
export RAG_CHUNK_SIZE="1000"
export RAG_DISABLE_RERANKING="false"
Configuration File
Create config.yaml:
# API Configuration
api:
base_url: "http://localhost:1234/v1"
api_key: "dummy"
# Qwen Model Configuration
embedding_model: "text-embedding-qwen3-embedding-4b"
embedding_max_tokens: 8192 # 8k context window
reranking_model: "qwen.qwen3-reranker-4b"
reranking_max_tokens: 32768 # 32k context window
# Request settings
timeout: 300 # seconds
max_retries: 3
# Database Configuration
database:
path: "./rag_db"
table_name: "code_chunks"
# Chunking Configuration
chunking:
max_tokens: 1000 # Maximum tokens per chunk
prefer_functions: true # Prefer function-level chunking
include_comments: true # Include comments in chunks
# Search Configuration
search:
use_reranking: true # Enable reranking for better results
top_k_initial: 20 # Initial number of results to retrieve
top_k_final: 5 # Final number of results after reranking
🎯 Quick Start
1. Index a Repository
# Index current directory
qwen-rag index .
# Index specific repository
qwen-rag index /path/to/repo
# Index with custom chunk size
qwen-rag index . --chunk-size 500
# Force reindex existing repository
qwen-rag index . --force
2. Search Code
# Basic search with reranking
qwen-rag search "function that handles authentication"
# Fast search without reranking
qwen-rag search "database connection" --no-reranking
# Limit results
qwen-rag search "error handling" --top-k 3
# Search only Python files
qwen-rag search "async function" --file-type .py
# Search only functions (when filtering is available)
qwen-rag search "validation logic" --chunk-type function
3. Interactive Mode
qwen-rag interactive
📖 Usage Examples
Repository Management
# Index multiple repositories
qwen-rag index /path/to/frontend
qwen-rag index /path/to/backend
qwen-rag index /path/to/scripts
# View database statistics
qwen-rag stats
# Show current configuration
qwen-rag config-show
# Delete repository from index
qwen-rag delete /path/to/repo
Advanced Search Examples
# Find authentication code
qwen-rag search "user authentication login password"
# Look for error handling patterns
qwen-rag search "try catch exception handling error"
# Find database operations
qwen-rag search "database query insert update delete"
# Search for API endpoints
qwen-rag search "REST API endpoint route handler"
# Find specific algorithms
qwen-rag search "sorting algorithm implementation"
# Look for configuration management
qwen-rag search "config settings environment variables"
Using Configuration Files
# Use custom config file
qwen-rag --config-file my-config.yaml index /path/to/repo
# Override settings via CLI
qwen-rag --api-base "http://localhost:8000" search "query"
🏗️ Architecture
Components
- Tree-sitter Manager: Handles parsing of 13+ programming languages
- Code Chunker: Intelligently splits code into semantic chunks (functions, classes)
- Embedding Service: Generates embeddings using Qwen3-Embedding-4B (2560 dimensions)
- Reranking Service: Reranks results using Qwen3-Reranker-4B for better precision
- Database Manager: Manages LanceDB operations and multi-repository support
- Search Service: Orchestrates search and ranking across all repositories
Data Flow
Repository → Tree-sitter → Semantic Chunks → Embeddings → LanceDB
↓
Query → Embedding → Vector Search → Reranking → Results
Supported Languages
Tree-sitter parsing for: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, C#, PHP, Ruby, Swift, Kotlin, Scala
Fallback text processing for: Shell, SQL, Markdown, YAML, JSON, HTML, CSS, and more
🎨 Semantic Chunking
The system uses tree-sitter to create intelligent, semantically meaningful chunks:
Function-Level Chunking
def authenticate_user(username, password):
"""Authenticate user credentials."""
# ... function body ...
Class Overview
class UserService:
def __init__(self, database_url): ...
def authenticate_user(self, username, password): ...
def get_user_profile(self, user_id): ...
Smart Collapsing
Large functions show signature + collapsed body for better overview.
🔧 Programmatic Usage
import asyncio
from code_rag.config import load_config
from code_rag.indexer import RepositoryIndexer
from code_rag.search import SearchService
async def main():
# Load configuration
config = load_config()
# Index repository
indexer = RepositoryIndexer(config)
await indexer.index_repository("./my_repo")
# Search
search_service = SearchService(config)
results = await search_service.search("authentication function")
for result in results.results:
print(f"{result.chunk.file_path}:{result.chunk.start_line}")
print(f"Score: {result.score}")
print(result.chunk.content[:200])
print("-" * 50)
await search_service.close()
if __name__ == "__main__":
asyncio.run(main())
📊 Performance
Typical Performance Metrics
- Indexing: ~1000 chunks/minute (depends on file complexity)
- Embedding Search: 100-500ms (without reranking)
- With Reranking: 1-3 seconds (includes embedding + reranking)
- Memory Usage: ~100-500MB (scales with repository size)
- Context Windows: 8k tokens (embedding), 32k tokens (reranking)
Optimization Tips
- Use
--no-rerankingfor faster searches during development - Reduce
--chunk-sizefor memory efficiency - Use file type filters (
--file-type .py) to narrow search scope - Index frequently used repositories locally
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Tree-sitter for excellent code parsing capabilities
- LanceDB for high-performance vector storage
- Qwen Team for powerful embedding and reranking models
- OpenAI for the API interface standard
🐛 Troubleshooting
Common Issues
Tree-sitter parsing errors: Some language parsers may not initialize. The system automatically falls back to text chunking.
API connection issues:
- Ensure your model server is running on the correct port
- Check that the model names match your server configuration
- Verify the API endpoint is accessible
Memory issues:
- Reduce chunk size:
qwen-rag index . --chunk-size 500 - Process smaller repositories or use file type filters
- Ensure you have sufficient RAM (4GB+ recommended)
Slow performance:
- Use
--no-rerankingfor faster searches - Check your model server performance
- Consider using GPU acceleration for your models
Getting Help
- Check
qwen-rag --helpfor all available commands - Run
python test_setup.pyto verify installation - Use
qwen-rag statsto check database status - Visit our GitHub Issues for support
🔗 Related Projects
- LM Studio: https://lmstudio.ai/ - Easy local model hosting
- Ollama: https://ollama.ai/ - Run LLMs locally
- Qwen Models: https://github.com/QwenLM - State-of-the-art language models
Made with ❤️ for developers who love intelligent code search
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qwen_rag-0.1.0.tar.gz.
File metadata
- Download URL: qwen_rag-0.1.0.tar.gz
- Upload date:
- Size: 28.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1657619d2cb8e9509b8b4d71040e992d442d892d2038931d99788e2adf37d75
|
|
| MD5 |
a1a6b004c295aba7c851cade3d83c49b
|
|
| BLAKE2b-256 |
c0a99dd2859f533a6c7da6034a7a1f80cc15ea99e6acdd5ba852631c110194b0
|
File details
Details for the file qwen_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qwen_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c6854540e75f225901ca866485e1fd9266f766f6efa14bbcfae5557b5595605
|
|
| MD5 |
6c33ced3c70aaf6f4d77089f1167cf79
|
|
| BLAKE2b-256 |
0b16bb32af3022cac40aea8ba6d33ae78be129974fe15c2d6973e5546f4bbbdb
|