A powerful RAG system for querying code repositories using tree-sitter parsing, LanceDB vector storage, and Qwen models

These details have not been verified by PyPI

Project links

Project description

Qwen RAG - Repository Retrieval Augmented Generation

A powerful RAG system for querying code repositories using tree-sitter parsing, LanceDB vector storage, and Qwen models for embedding and reranking.

🚀 Features

🔍 Semantic Code Search: Find code by meaning, not just keywords
🌐 Multi-Language Support: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, C#, and more
🌳 Tree-sitter Parsing: Intelligent code chunking preserving semantic structure
⚡ Function-Level Indexing: Automatically extracts and indexes functions, classes, and methods
🤖 Qwen Model Integration: Uses Qwen3-Embedding-4B and Qwen3-Reranker-4B models
📍 Precise Location Tracking: File paths, line numbers, and character positions
💾 Vector Database: Powered by LanceDB for fast similarity search
🖥️ CLI Interface: Easy-to-use command-line tool
⚙️ Configurable: Flexible configuration via environment variables or files
📦 Multi-Repository Support: Index and search across multiple code repositories

📋 Requirements

Python 3.9+
4GB+ RAM recommended
Qwen embedding and reranking models accessible via OpenAI-compatible API
Tree-sitter language parsers (installed automatically)

🛠️ Installation

From PyPI (Recommended)

pip install qwen-rag

From Source

git clone https://github.com/yourusername/QwenRag.git
cd QwenRag
pip install -r requirements.txt
pip install -e .

Verify Installation

qwen-rag --help
# or
python -m code_rag.cli --help

🤖 Model Setup

Qwen RAG works with any OpenAI-compatible API serving Qwen models. Here are the most popular options:

Option 1: LM Studio (Recommended for Beginners)

Download LM Studio: https://lmstudio.ai/
Download Models:
- Search and download: text-embedding-qwen3-embedding-4b
- Search and download: qwen.qwen3-reranker-4b
Start Local Server:
- Load the embedding model
- Go to "Local Server" tab
- Start server on http://localhost:1234
Configure Qwen RAG: Use default settings (already configured for localhost:1234)

Option 2: Ollama

# Install Ollama: https://ollama.ai/
ollama pull qwen:embedding    # For embeddings
ollama pull qwen:reranker     # For reranking

# Start Ollama server
ollama serve

Option 3: vLLM or Other OpenAI-Compatible Servers

# Example with vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-Embedding-4B \
  --port 1234

Option 4: Remote API Services

Configure your API endpoint in the configuration file or environment variables.

⚙️ Configuration

Quick Start (Using Defaults)

The system works out-of-the-box with LM Studio running on localhost:1234:

# Index your code repository
qwen-rag index /path/to/your/code

# Search your code
qwen-rag search "authentication function"

Environment Variables Configuration

# API Configuration
export RAG_API_BASE="http://localhost:1234/v1"
export RAG_API_KEY="dummy"
export RAG_EMBEDDING_MODEL="text-embedding-qwen3-embedding-4b"
export RAG_RERANKING_MODEL="qwen.qwen3-reranker-4b"

# Optional: Context Window Sizes
export RAG_EMBEDDING_MAX_TOKENS="8192"
export RAG_RERANKING_MAX_TOKENS="32768"

# Optional: Database and Processing
export RAG_DB_PATH="./rag_db"
export RAG_CHUNK_SIZE="1000"
export RAG_DISABLE_RERANKING="false"

Configuration File

Create config.yaml:

# API Configuration
api:
  base_url: "http://localhost:1234/v1"
  api_key: "dummy"
  
  # Qwen Model Configuration
  embedding_model: "text-embedding-qwen3-embedding-4b"
  embedding_max_tokens: 8192  # 8k context window
  
  reranking_model: "qwen.qwen3-reranker-4b"
  reranking_max_tokens: 32768  # 32k context window
  
  # Request settings
  timeout: 300  # seconds
  max_retries: 3

# Database Configuration
database:
  path: "./rag_db"
  table_name: "code_chunks"

# Chunking Configuration
chunking:
  max_tokens: 1000  # Maximum tokens per chunk
  prefer_functions: true  # Prefer function-level chunking
  include_comments: true  # Include comments in chunks

# Search Configuration
search:
  use_reranking: true  # Enable reranking for better results
  top_k_initial: 20  # Initial number of results to retrieve
  top_k_final: 5  # Final number of results after reranking

🎯 Quick Start

1. Index a Repository

# Index current directory
qwen-rag index .

# Index specific repository
qwen-rag index /path/to/repo

# Index with custom chunk size
qwen-rag index . --chunk-size 500

# Force reindex existing repository
qwen-rag index . --force

2. Search Code

# Basic search with reranking
qwen-rag search "function that handles authentication"

# Fast search without reranking  
qwen-rag search "database connection" --no-reranking

# Limit results
qwen-rag search "error handling" --top-k 3

# Search only Python files
qwen-rag search "async function" --file-type .py

# Search only functions (when filtering is available)
qwen-rag search "validation logic" --chunk-type function

3. Interactive Mode

qwen-rag interactive

📖 Usage Examples

Repository Management

# Index multiple repositories
qwen-rag index /path/to/frontend
qwen-rag index /path/to/backend  
qwen-rag index /path/to/scripts

# View database statistics
qwen-rag stats

# Show current configuration
qwen-rag config-show

# Delete repository from index
qwen-rag delete /path/to/repo

Advanced Search Examples

# Find authentication code
qwen-rag search "user authentication login password"

# Look for error handling patterns
qwen-rag search "try catch exception handling error"

# Find database operations
qwen-rag search "database query insert update delete"

# Search for API endpoints
qwen-rag search "REST API endpoint route handler"

# Find specific algorithms
qwen-rag search "sorting algorithm implementation"

# Look for configuration management
qwen-rag search "config settings environment variables"

Using Configuration Files

# Use custom config file
qwen-rag --config-file my-config.yaml index /path/to/repo

# Override settings via CLI
qwen-rag --api-base "http://localhost:8000" search "query"

🏗️ Architecture

Components

Tree-sitter Manager: Handles parsing of 13+ programming languages
Code Chunker: Intelligently splits code into semantic chunks (functions, classes)
Embedding Service: Generates embeddings using Qwen3-Embedding-4B (2560 dimensions)
Reranking Service: Reranks results using Qwen3-Reranker-4B for better precision
Database Manager: Manages LanceDB operations and multi-repository support
Search Service: Orchestrates search and ranking across all repositories

Data Flow

Repository → Tree-sitter → Semantic Chunks → Embeddings → LanceDB
     ↓
Query → Embedding → Vector Search → Reranking → Results

Supported Languages

Tree-sitter parsing for: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, C#, PHP, Ruby, Swift, Kotlin, Scala

Fallback text processing for: Shell, SQL, Markdown, YAML, JSON, HTML, CSS, and more

🎨 Semantic Chunking

The system uses tree-sitter to create intelligent, semantically meaningful chunks:

Function-Level Chunking

def authenticate_user(username, password):
    """Authenticate user credentials."""
    # ... function body ...

Class Overview

class UserService:
    def __init__(self, database_url): ...
    def authenticate_user(self, username, password): ...
    def get_user_profile(self, user_id): ...

Smart Collapsing

Large functions show signature + collapsed body for better overview.

🔧 Programmatic Usage

import asyncio
from code_rag.config import load_config
from code_rag.indexer import RepositoryIndexer
from code_rag.search import SearchService

async def main():
    # Load configuration
    config = load_config()
    
    # Index repository
    indexer = RepositoryIndexer(config)
    await indexer.index_repository("./my_repo")
    
    # Search
    search_service = SearchService(config)
    results = await search_service.search("authentication function")
    
    for result in results.results:
        print(f"{result.chunk.file_path}:{result.chunk.start_line}")
        print(f"Score: {result.score}")
        print(result.chunk.content[:200])
        print("-" * 50)
    
    await search_service.close()

if __name__ == "__main__":
    asyncio.run(main())

📊 Performance

Typical Performance Metrics

Indexing: ~1000 chunks/minute (depends on file complexity)
Embedding Search: 100-500ms (without reranking)
With Reranking: 1-3 seconds (includes embedding + reranking)
Memory Usage: ~100-500MB (scales with repository size)
Context Windows: 8k tokens (embedding), 32k tokens (reranking)

Optimization Tips

Use --no-reranking for faster searches during development
Reduce --chunk-size for memory efficiency
Use file type filters (--file-type .py) to narrow search scope
Index frequently used repositories locally

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests if applicable
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Tree-sitter for excellent code parsing capabilities
LanceDB for high-performance vector storage
Qwen Team for powerful embedding and reranking models
OpenAI for the API interface standard

🐛 Troubleshooting

Common Issues

Tree-sitter parsing errors: Some language parsers may not initialize. The system automatically falls back to text chunking.

API connection issues:

Ensure your model server is running on the correct port
Check that the model names match your server configuration
Verify the API endpoint is accessible

Memory issues:

Reduce chunk size: qwen-rag index . --chunk-size 500
Process smaller repositories or use file type filters
Ensure you have sufficient RAM (4GB+ recommended)

Slow performance:

Use --no-reranking for faster searches
Check your model server performance
Consider using GPU acceleration for your models

Getting Help

Check qwen-rag --help for all available commands
Run python test_setup.py to verify installation
Use qwen-rag stats to check database status
Visit our GitHub Issues for support

🔗 Related Projects

LM Studio: https://lmstudio.ai/ - Easy local model hosting
Ollama: https://ollama.ai/ - Run LLMs locally
Qwen Models: https://github.com/QwenLM - State-of-the-art language models

Made with ❤️ for developers who love intelligent code search

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_rag-0.1.0.tar.gz (28.7 kB view details)

Uploaded Jun 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwen_rag-0.1.0-py3-none-any.whl (26.6 kB view details)

Uploaded Jun 9, 2025 Python 3

File details

Details for the file qwen_rag-0.1.0.tar.gz.

File metadata

Download URL: qwen_rag-0.1.0.tar.gz
Upload date: Jun 9, 2025
Size: 28.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for qwen_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f1657619d2cb8e9509b8b4d71040e992d442d892d2038931d99788e2adf37d75`
MD5	`a1a6b004c295aba7c851cade3d83c49b`
BLAKE2b-256	`c0a99dd2859f533a6c7da6034a7a1f80cc15ea99e6acdd5ba852631c110194b0`

See more details on using hashes here.

File details

Details for the file qwen_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: qwen_rag-0.1.0-py3-none-any.whl
Upload date: Jun 9, 2025
Size: 26.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for qwen_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c6854540e75f225901ca866485e1fd9266f766f6efa14bbcfae5557b5595605`
MD5	`6c33ced3c70aaf6f4d77089f1167cf79`
BLAKE2b-256	`0b16bb32af3022cac40aea8ba6d33ae78be129974fe15c2d6973e5546f4bbbdb`

See more details on using hashes here.

qwen-rag 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Qwen RAG - Repository Retrieval Augmented Generation

🚀 Features

📋 Requirements

🛠️ Installation

From PyPI (Recommended)

From Source

Verify Installation

🤖 Model Setup

Option 1: LM Studio (Recommended for Beginners)

Option 2: Ollama

Option 3: vLLM or Other OpenAI-Compatible Servers

Option 4: Remote API Services

⚙️ Configuration

Quick Start (Using Defaults)

Environment Variables Configuration

Configuration File

🎯 Quick Start

1. Index a Repository

2. Search Code

3. Interactive Mode

📖 Usage Examples

Repository Management

Advanced Search Examples

Using Configuration Files

🏗️ Architecture

Components

Data Flow

Supported Languages

🎨 Semantic Chunking

Function-Level Chunking

Class Overview

Smart Collapsing

🔧 Programmatic Usage

📊 Performance

Typical Performance Metrics

Optimization Tips

🤝 Contributing

📄 License

🙏 Acknowledgments

🐛 Troubleshooting

Common Issues

Getting Help

🔗 Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes