Universal RAG system for project documentation
Project description
DocRAG Kit
Universal RAG (Retrieval-Augmented Generation) system for project documentation. Quickly add AI-powered semantic search to any project.
Обновление: Если у вас уже установлен DocRAG Kit, см. UPDATE.md для быстрого обновления.
Features
- Quick Setup - Initialize RAG system in any project with one command
- Universal - Works with any documentation (Markdown, code, configs)
- MCP Integration - Seamless integration with Kiro AI via Model Context Protocol
- Multilingual - Supports Russian and English questions and answers
- Project Templates - Predefined templates for Symfony, iOS, and general projects
- Secure - API keys stored safely in .env files
Installation
Обновление существующих проектов: См. UPDATE.md для быстрого обновления
Requirements
- Python >= 3.10 (required for MCP library)
- pip >= 21.0
From PyPI
pip install docrag-kit
From Source
git clone https://github.com/dexiusprime-oss/docrag-kit.git
cd docrag-kit
pip install -e .
Troubleshooting Installation
If you encounter dependency conflicts with onnxruntime or pulsar-client:
# Use Python 3.10+
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install docrag-kit
For more solutions, see docs/TROUBLESHOOTING.md
Quick Start
1. Initialize RAG System
Navigate to your project directory and run:
docrag init
This will:
- Start an interactive configuration wizard
- Ask for your LLM provider (OpenAI or Gemini)
- Request your API key
- Configure directories and file types to index
- Create
.docrag/directory with configuration
2. Index Your Documentation
docrag index
This will:
- Scan configured directories for documentation
- Split documents into chunks
- Create embeddings using your chosen LLM provider
- Store vectors in local ChromaDB database
3. Connect to Kiro AI
docrag mcp-config
This will display the MCP server configuration to add to Kiro.
4. Start Searching
Once configured in Kiro, you can ask questions about your project:
- "What is the architecture of this project?"
- "How do I configure the database?"
- "What APIs are available?"
Configuration
After initialization, your project will have:
your-project/
├── .docrag/
│ ├── config.yaml # Configuration file
│ ├── mcp_server.py # MCP server for Kiro
│ ├── vectordb/ # Vector database (gitignored)
│ └── .gitignore # Excludes vectordb and .env
└── .env # API keys (gitignored)
Configuration File
.docrag/config.yaml contains all settings:
project:
name: "my-project"
type: "symfony" # symfony, ios, general, custom
llm:
provider: "openai" # openai, gemini
embedding_model: "text-embedding-3-small"
llm_model: "gpt-4o-mini"
temperature: 0.3
indexing:
directories:
- "docs/"
- "src/"
extensions:
- ".md"
- ".txt"
- ".py"
exclude_patterns:
- "node_modules/"
- ".git/"
chunking:
chunk_size: 1000
chunk_overlap: 200
retrieval:
top_k: 5
Commands
docrag init
Initialize DocRAG in current project with interactive wizard.
docrag index
Index project documents and create vector database.
docrag reindex
Rebuild vector database from scratch (useful after documentation changes).
docrag config
Display current configuration.
docrag config --edit
Open configuration file in default editor.
docrag mcp-config
Display MCP server configuration for Kiro integration.
docrag doctor
Diagnose installation and configuration issues. Checks:
- DocRAG initialization
- Configuration files
- API keys
- Vector database
- Python environment
- Required packages
- MCP configuration
docrag fix-prompt
Fix prompt template to include required placeholders ({context} and {question}).
Use this command if answer_question tool returns only sources without AI-generated answer.
docrag --version
Display version information.
docrag update
Update DocRAG configuration and MCP server for existing projects. Use this after upgrading the package to get new features.
docrag fix-database
Fix database permission and corruption issues. Use this when encountering "readonly database" errors or other database problems.
docrag --help
Display help information.
Supported File Types
- Markdown:
.md - Text:
.txt - Python:
.py - PHP:
.php - Swift:
.swift - JSON:
.json - YAML:
.yaml,.yml - Config:
.conf,.config,.ini
LLM Providers
OpenAI
- Embeddings:
text-embedding-3-small - LLM:
gpt-4o-mini - Get API key: https://platform.openai.com/api-keys
Google Gemini
- Embeddings:
models/embedding-001 - LLM:
gemini-1.5-flash - Get API key: https://makersuite.google.com/app/apikey
Project Templates
Symfony
Optimized for Symfony PHP framework projects with expert knowledge of:
- Symfony components and bundles
- Doctrine ORM
- Twig templates
- PHP best practices
iOS
Optimized for iOS development projects with expert knowledge of:
- Swift programming language
- UIKit and SwiftUI
- iOS SDK and frameworks
- Xcode and development tools
General Documentation
General-purpose template for any project type.
Custom
Provide your own custom prompt template.
Security
CRITICAL WARNING: Never commit your .env file to git!
Your .env file contains sensitive API keys that provide access to paid services. If exposed, they can be used by others, potentially costing you money or compromising your accounts.
Automatic Security Features
DocRAG Kit automatically protects your API keys by:
- Creating
.docrag/.gitignoreto exclude sensitive files (vectordb/,.env,*.pyc) - Checking if
.envis in your root.gitignore - Offering to add
.envto.gitignoreif missing - Creating
.env.exampletemplate without real keys - Displaying security warnings after initialization
Best Practices
-
Always keep
.envin.gitignore- DocRAG Kit checks this during initialization
- Verify with:
grep .env .gitignore
-
Use
.env.exampleas a template- Share
.env.examplewith your team (no real keys) - Team members copy it to
.envand add their own keys
- Share
-
Never share API keys
- Don't paste them in public issues or forums
- Don't commit them to public repositories
- Don't share them in chat or email
-
Rotate keys if exposed
- If you accidentally commit
.env, revoke keys immediately - Generate new keys from provider dashboard
- Update your
.envfile
- If you accidentally commit
What to Do If You Accidentally Commit API Keys
If you accidentally commit your .env file or API keys to git:
-
Revoke the exposed keys immediately:
- OpenAI: https://platform.openai.com/api-keys
- Google Gemini: https://makersuite.google.com/app/apikey
-
Generate new API keys from the provider dashboard
-
Update your
.envfile with the new keys -
Remove keys from git history:
# Using git filter-branch (for small repos) git filter-branch --force --index-filter \ "git rm --cached --ignore-unmatch .env" \ --prune-empty --tag-name-filter cat -- --all # Or use BFG Repo-Cleaner (recommended for large repos) # https://rtyley.github.io/bfg-repo-cleaner/ bfg --delete-files .env
-
Force push to remote (if already pushed):
git push origin --force --all git push origin --force --tags
-
Notify team members to re-clone the repository
Pre-commit Hook (Recommended)
Add a pre-commit hook to prevent accidentally committing .env:
# Create .git/hooks/pre-commit
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
if git diff --cached --name-only | grep -q "^\.env$"; then
echo "ERROR: Attempting to commit .env file!"
echo "This file contains sensitive API keys."
echo "Add .env to .gitignore and try again."
exit 1
fi
EOF
# Make it executable
chmod +x .git/hooks/pre-commit
Security Checklist
Before pushing to git, verify:
-
.envis in.gitignore -
.envis not staged for commit (git status) -
.env.exampleexists (without real keys) -
.docrag/.gitignoreexcludesvectordb/and.env - No API keys in configuration files
- Pre-commit hook is installed (optional but recommended)
MCP Integration
DocRAG Kit provides four MCP tools for Kiro AI:
search_docs - Fast Fragment Search
Returns relevant document fragments with source files. Best for quick lookups.
Parameters:
question(string, required): Search query or topicmax_results(integer, optional): Number of results (1-10, default: 3)
Performance: ~1 second, no LLM tokens used
Example:
Question: "database configuration"
Response:
🔍 Found 2 relevant document(s):
--- Result 1 ---
📄 Source: docs/config.md
Database settings in .env:
DB_HOST=localhost
DB_PORT=5432
...
answer_question - AI-Generated Answer
Returns comprehensive AI-generated answer synthesized from documentation. Best for complex questions.
Parameters:
question(string, required): Question to answerinclude_sources(boolean, optional): Include source files (default: true)
Performance: ~3-5 seconds, uses LLM tokens
Example:
Question: "How do I configure the database?"
Response: "To configure the database, edit the .env file and set DB_HOST, DB_PORT, and DB_NAME..."
Sources:
• docs/config.md
• README.md
list_indexed_docs
List all indexed documents in the project.
Returns: List of all source files in the vector database.
reindex_docs - Smart Reindexing
Automatically detects document changes and performs intelligent reindexing. Best for keeping documentation up-to-date.
Parameters:
force(boolean, optional): Force full reindexing even if no changes detected (default: false)check_only(boolean, optional): Only check if reindexing is needed without performing it (default: false)
Performance: Variable - fast check (~1s), full reindex depends on document count
Example:
# Check if reindexing is needed
reindex_docs(check_only=True)
Response: "Changes detected in 3 file(s): docs/api.md, README.md, src/config.py"
# Perform smart reindexing
reindex_docs()
Response: "Reindexing completed! Files processed: 15, Chunks created: 127"
# Force full reindexing
reindex_docs(force=True)
Response: "Force reindexing completed! Reason: Force reindexing requested"
Tool Selection Guide:
- Use
search_docsfor quick lookups (faster, free) - Use
answer_questionfor complex questions (slower, uses tokens) - Use
reindex_docswhen documents have been updated - Use
list_indexed_docsto see what's currently indexed - See docs/AGENT_QUICK_START.md for detailed guide
Documentation
Quick Links
- docs/AGENT_QUICK_START.md - Quick start guide for AI agents
- docs/SECURITY.md - Complete security guide (read this first!)
- docs/EXAMPLES.md - Detailed usage examples for different project types
- docs/MCP_INTEGRATION.md - Complete guide for Kiro AI integration
- docs/TROUBLESHOOTING.md - Solutions for common issues
- docs/API_REFERENCE.md - Complete CLI and configuration reference
Examples
See docs/EXAMPLES.md for detailed usage examples including:
- Symfony project setup
- iOS project setup
- General documentation project
- Example questions and answers
- Configuration examples
MCP Integration
See docs/MCP_INTEGRATION.md for complete integration guide:
- Getting MCP configuration
- Manual and automatic setup
- Testing MCP server
- Troubleshooting connection issues
Troubleshooting
See docs/TROUBLESHOOTING.md for detailed solutions to:
- Installation issues
- API key problems
- Indexing errors
- MCP connection issues
- Performance optimization
For database-specific issues, see docs/DATABASE_TROUBLESHOOTING.md:
- "Readonly database" errors
- Database corruption
- Permission issues
- Lock file problems
Quick fixes:
Database Issues (Readonly Database, Corruption)
# Automatic fix for most database problems
docrag fix-database
# Manual fix if needed
rm -rf .docrag/vectordb && docrag index
Database Not Found
docrag index
API Key Errors
# Check .env file
cat .env
# Should show: OPENAI_API_KEY=sk-... or GOOGLE_API_KEY=...
MCP Connection Issues
# Verify MCP server exists
ls .docrag/mcp_server.py
# Test manually
python .docrag/mcp_server.py
Development
Setup Development Environment
git clone https://github.com/yourusername/docrag-kit.git
cd docrag-kit
pip install -e ".[dev]"
Run Tests
pytest
Run Property-Based Tests
pytest tests/property/
Code Formatting
black src/ tests/
Type Checking
mypy src/
Requirements
- Python >= 3.8
- OpenAI API key or Google Gemini API key
- 100MB+ disk space for vector database
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
Documentation
- README.md - Main documentation
- docs/ - Complete documentation
- docs/SECURITY.md - Security best practices
- docs/EXAMPLES.md - Usage examples
- docs/MCP_INTEGRATION.md - MCP setup guide
- docs/TROUBLESHOOTING.md - Troubleshooting guide
- docs/API_REFERENCE.md - Complete API reference
Community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Changelog
0.1.6 (2024-12-14)
- NEW: Added
reindex_docsMCP tool for smart reindexing with automatic change detection - NEW: Added
docrag debug-mcpcommand to diagnose CLI vs MCP synchronization issues - NEW: Added
docrag fix-databasecommand for database permission and corruption issues - NEW: Added
docrag updatecommand for upgrading existing projects - FIX: Fixed critical CLI/MCP synchronization issue where different databases were accessed
- IMPROVEMENT: Enhanced MCP configuration with correct working directory paths
- IMPROVEMENT: Added comprehensive documentation for troubleshooting
- IMPROVEMENT: Added debug logging to MCP server for path diagnostics
- This resolves issues where CLI shows many documents but MCP shows only few
0.1.5 (2024-12-09)
- FIX: Added
docrag fix-promptcommand to fix prompt templates missing required placeholders - FIX: Added validation for prompt template placeholders (
{context}and{question}) - IMPROVEMENT: Better error messages when prompt template is invalid
- This fixes the issue where
answer_questiontool returns only sources without AI-generated answer
0.1.4 (2024-12-09)
- NEW: Added
answer_questionMCP tool for AI-generated comprehensive answers - Split
search_docsinto two distinct tools:search_docs: Fast semantic search returning document fragments (no LLM, ~1s)answer_question: AI-generated comprehensive answers (uses LLM, ~3-5s)
- All three MCP tools now available:
search_docs,answer_question,list_indexed_docs - Improved tool descriptions and parameter schemas
0.1.3 (2024-12-09)
- Skipped due to packaging issue
0.1.2 (2024-12-09)
- Skipped due to packaging issue
0.1.1 (2024-12-09)
- Fixed GitHub Actions permissions for automated releases
- Updated artifact actions to v4
- Improved CI/CD pipeline
0.1.0 (2024-12-09)
- Initial release with core functionality
- Support for OpenAI and Gemini providers
- MCP integration for Kiro AI
- Interactive setup wizard
- Project templates (Symfony, iOS, General)
- Doctor command for diagnostics
- Automatic project structure detection
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docrag_kit-0.1.6.tar.gz.
File metadata
- Download URL: docrag_kit-0.1.6.tar.gz
- Upload date:
- Size: 82.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff6d4c9a2163065b377f19567a330971c05179457c0f571a5fe53e2cc72672f8
|
|
| MD5 |
628dfc2a7d8a2a3875080d1e99528634
|
|
| BLAKE2b-256 |
ae822531d5afe34295113bcfad501bd0396f2dee62601f2d4629acedd2c3ef82
|
File details
Details for the file docrag_kit-0.1.6-py3-none-any.whl.
File metadata
- Download URL: docrag_kit-0.1.6-py3-none-any.whl
- Upload date:
- Size: 44.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8974cfd1e52940f6c8995eaba3ca3fce20553ed18b7fcc57dd5634b9c273f99
|
|
| MD5 |
09ca51ae89467866522c8569120c020d
|
|
| BLAKE2b-256 |
6c2f6f3adc7ec4375ed4a3810b7fd63786d8f81e9f592b470274a3bc35183213
|