RAG-enabled agent for code refactoring assistance
Project description
Phoenix RAG
A Retrieval-Augmented Generation (RAG) system for code refactoring assistance. Phoenix helps developers identify code smells, suggests refactoring patterns, and provides best practices grounded in established software engineering principles.
Features
- Knowledge Retrieval: Search a curated knowledge base of refactoring patterns, code smells, and best practices
- Code Analysis: Analyze Python code for structural issues, complexity metrics, and code smells
- ReAct-Style Reasoning: Agent uses a think-act-observe loop to gather information before responding
- Groundedness Verification: Responses are verified against retrieved sources to reduce hallucination
- Hybrid Chunking: Intelligent document chunking that preserves semantic meaning and code structure
- Multiple LLM Support: Works with Ollama (local), Groq, Anthropic, and OpenAI
Architecture
phoenix-rag/
├── src/phoenix_rag/
│ ├── agent.py # Main ReAct agent orchestrator
│ ├── config.py # Configuration management
│ ├── retrieval/
│ │ ├── module.py # ChromaDB vector store integration
│ │ ├── chunking.py # Semantic and code-aware chunking
│ │ └── ingestion.py # Document ingestion pipeline
│ ├── tools/
│ │ ├── registry.py # Tool management
│ │ ├── code_analyzer.py # Code smell detection
│ │ ├── complexity_calculator.py # Cyclomatic complexity metrics
│ │ └── retrieval_tool.py # Knowledge base search
│ └── verification/
│ ├── groundedness.py # Response verification
│ └── self_evaluation.py # Self-correction
├── data/documents/ # Knowledge base documents
├── app.py # Streamlit web interface
└── demo.py # CLI demonstration
Installation
Quick Install (PyPI)
pip install phoenix-rag
With Groq support (recommended for cloud):
pip install phoenix-rag[groq]
Prerequisites
- Python 3.10+
Development Setup
- Clone the repository:
git clone https://github.com/kkipngenokoech/phoenix-rag.git
cd phoenix-rag
- Create and activate the conda environment:
conda env create -f environment.yml
conda activate phoenix-rag
- Install the package in development mode:
pip install -e .
- Configure your environment:
cp .env.example .env
# Edit .env with your settings
LLM Configuration
Phoenix supports multiple LLM providers. Set LLM_PROVIDER in your .env file:
| Provider | Value | Requirements |
|---|---|---|
| Auto (recommended) | auto |
Tries Ollama first, falls back to Groq |
| Ollama (local) | ollama |
Ollama running locally |
| Groq | groq |
GROQ_API_KEY |
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
| OpenAI | openai |
OPENAI_API_KEY |
Using Ollama (Local, Free)
- Install Ollama: https://ollama.ai
- Pull a model:
ollama pull llama3.2
- Start Ollama:
ollama serve
Using Groq (Cloud, Free Tier)
- Get an API key from https://console.groq.com
- Add to
.env:
GROQ_API_KEY=gsk_your_key_here
Usage
Web Interface
Run the Streamlit app:
streamlit run app.py
The web interface provides:
- Chat interface for asking questions about refactoring
- Code analysis tab for pasting and analyzing code
- Agent trace viewer showing reasoning steps
- Quick tools for direct code smell detection and complexity metrics
CLI Demo
Run the demonstration script:
python demo.py
This demonstrates:
- Document ingestion with hybrid chunking
- Knowledge retrieval queries
- Code analysis with refactoring suggestions
- Groundedness verification
Programmatic Usage
from phoenix_rag.agent import PhoenixAgent
from phoenix_rag.config import PhoenixConfig
# Initialize
config = PhoenixConfig()
agent = PhoenixAgent(config)
# Ingest knowledge base
agent.ingest_knowledge("data/documents")
# Ask a question
response, trace = agent.run("What is the Extract Method refactoring pattern?")
print(response)
# Analyze code
code = '''
def process_data(a, b, c, d, e, f):
# Long method with many parameters
result = a + b + c + d + e + f
return result
'''
response, trace = agent.run("Analyze this code for code smells", code=code)
print(response)
Tools
Phoenix provides three built-in tools:
knowledge_retrieval
Searches the vector database for relevant refactoring knowledge.
Parameters:
query: Search query stringdoc_type: Filter by document type (refactoring_pattern, code_smell, best_practice, style_guide, all)num_results: Number of results to return (1-10)
code_analyzer
Analyzes Python code for structure and code smells.
Parameters:
code: Python code to analyzeanalysis_type: Type of analysis (full, smells, structure, complexity)
Detects code smells:
- Long methods
- Long parameter lists
- God classes
- Deep nesting
- Complex conditionals
complexity_calculator
Calculates detailed code complexity metrics.
Parameters:
code: Python code to analyzemetrics: List of metrics (cyclomatic, maintainability, halstead, raw, all)
Knowledge Base
The knowledge base is stored in data/documents/ with the following structure:
data/documents/
├── refactoring_patterns/ # Extract Method, Extract Class, etc.
├── code_smells/ # Long Method, God Class, etc.
├── best_practices/ # SOLID principles, etc.
└── style_guides/ # Python style guidelines
Adding Custom Documents
- Create a markdown file in the appropriate subdirectory
- Run ingestion:
agent.retrieval.ingest_from_directory(
Path("data/documents/your_folder"),
doc_type="your_type"
)
Deployment
Streamlit Cloud
-
Push your code to GitHub
-
Go to https://share.streamlit.io and connect your repository
-
Add secrets in the Streamlit Cloud dashboard:
GROQ_API_KEY = "gsk_your_key_here"
- Deploy
The app automatically detects if Ollama is unavailable and falls back to Groq.
Environment Variables
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
LLM provider to use | auto |
LLM_MODEL |
Model name | llama3.2 |
GROQ_API_KEY |
Groq API key | - |
ANTHROPIC_API_KEY |
Anthropic API key | - |
OPENAI_API_KEY |
OpenAI API key | - |
OLLAMA_BASE_URL |
Ollama server URL | http://localhost:11434 |
EMBEDDING_MODEL |
Sentence transformer model | all-MiniLM-L6-v2 |
CHROMA_PERSIST_DIRECTORY |
ChromaDB storage path | ./data/chroma_db |
MAX_ITERATIONS |
Max agent reasoning iterations | 10 |
GROUNDEDNESS_THRESHOLD |
Minimum groundedness score | 0.7 |
Configuration
PhoenixConfig
The main configuration class with nested configs:
LLMConfig: LLM provider settingsEmbeddingConfig: Embedding model settingsVectorDBConfig: ChromaDB settingsChunkingConfig: Document chunking parametersAgentConfig: Agent behavior settings
Chunking Strategy
Phoenix uses a hybrid chunking strategy:
- SemanticChunker: Preserves paragraph and section boundaries for documentation
- CodeAwareChunker: Keeps code blocks intact, respects function/class boundaries
- HybridChunker: Automatically detects content type and applies appropriate strategy
Development
Running Tests
pip install -e ".[dev]"
pytest
Code Formatting
black src/
ruff check src/
Type Checking
mypy src/
License
MIT License
Acknowledgments
- Built with LangChain, ChromaDB, and Sentence Transformers
- Refactoring patterns based on Martin Fowler's catalog
- SOLID principles documentation from Robert C. Martin
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phoenix_rag-0.1.0.tar.gz.
File metadata
- Download URL: phoenix_rag-0.1.0.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0864aa77cb1ef1fe94db7bee209abbbe1d6d084bacd97f8d64ac2b6da312f6ad
|
|
| MD5 |
eebd9a2846165f943c12d1324a3404a0
|
|
| BLAKE2b-256 |
0eb1e644a9bc0dace56823d183d65980157be35e824bab09e92c0cc72bc592a1
|
Provenance
The following attestation bundles were made for phoenix_rag-0.1.0.tar.gz:
Publisher:
publish.yml on kkipngenokoech/phoenix-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phoenix_rag-0.1.0.tar.gz -
Subject digest:
0864aa77cb1ef1fe94db7bee209abbbe1d6d084bacd97f8d64ac2b6da312f6ad - Sigstore transparency entry: 868650016
- Sigstore integration time:
-
Permalink:
kkipngenokoech/phoenix-rag@5ed459c3a769cdbf378fd3ceb7bcf4bd7c052d42 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kkipngenokoech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ed459c3a769cdbf378fd3ceb7bcf4bd7c052d42 -
Trigger Event:
release
-
Statement type:
File details
Details for the file phoenix_rag-0.1.0-py3-none-any.whl.
File metadata
- Download URL: phoenix_rag-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0cd659b6b111f53c7042e21c21916165832ae18a37c4d618c0a12798915085a
|
|
| MD5 |
58e2775734a7d6eb0eb7de3180dffda6
|
|
| BLAKE2b-256 |
28dbd04982a1b7ff4664a8bfd82ff78de02b8cae7f2eeb0d304e8160a775a0a6
|
Provenance
The following attestation bundles were made for phoenix_rag-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on kkipngenokoech/phoenix-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phoenix_rag-0.1.0-py3-none-any.whl -
Subject digest:
c0cd659b6b111f53c7042e21c21916165832ae18a37c4d618c0a12798915085a - Sigstore transparency entry: 868650019
- Sigstore integration time:
-
Permalink:
kkipngenokoech/phoenix-rag@5ed459c3a769cdbf378fd3ceb7bcf4bd7c052d42 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kkipngenokoech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5ed459c3a769cdbf378fd3ceb7bcf4bd7c052d42 -
Trigger Event:
release
-
Statement type: