Universal RAG system for project documentation

These details have not been verified by PyPI

Project links

Project description

DocRAG Kit

Tests Python License

Universal RAG (Retrieval-Augmented Generation) system for project documentation. Quickly add AI-powered semantic search to any project.

Обновление: Если у вас уже установлен DocRAG Kit, см. UPDATE.md для быстрого обновления.

Features

Quick Setup - Initialize RAG system in any project with one command
Universal - Works with any documentation (Markdown, code, configs)
MCP Integration - Seamless integration with Kiro AI via Model Context Protocol
Multilingual - Supports Russian and English questions and answers
Project Templates - Predefined templates for Symfony, iOS, and general projects
Secure - API keys stored safely in .env files

Installation

Обновление существующих проектов: См. UPDATE.md для быстрого обновления

Requirements

Python >= 3.10 (required for MCP library)
pip >= 21.0

From PyPI

pip install docrag-kit

From Source

git clone https://github.com/dexiusprime-oss/docrag-kit.git
cd docrag-kit
pip install -e .

Troubleshooting Installation

If you encounter dependency conflicts with onnxruntime or pulsar-client:

# Use Python 3.10+
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install docrag-kit

For more solutions, see docs/TROUBLESHOOTING.md

Quick Start

1. Initialize RAG System

Navigate to your project directory and run:

docrag init

This will:

Start an interactive configuration wizard
Ask for your LLM provider (OpenAI or Gemini)
Request your API key
Configure directories and file types to index
Create .docrag/ directory with configuration

2. Index Your Documentation

docrag index

This will:

Scan configured directories for documentation
Split documents into chunks
Create embeddings using your chosen LLM provider
Store vectors in local ChromaDB database

3. Connect to Kiro AI

docrag mcp-config

This will display the MCP server configuration to add to Kiro.

4. Start Searching

Once configured in Kiro, you can ask questions about your project:

"What is the architecture of this project?"
"How do I configure the database?"
"What APIs are available?"

Configuration

After initialization, your project will have:

your-project/
├── .docrag/
│   ├── config.yaml      # Configuration file
│   ├── mcp_server.py    # MCP server for Kiro
│   ├── vectordb/        # Vector database (gitignored)
│   └── .gitignore       # Excludes vectordb and .env
└── .env                 # API keys (gitignored)

Configuration File

.docrag/config.yaml contains all settings:

project:
  name: "my-project"
  type: "symfony"  # symfony, ios, general, custom

llm:
  provider: "openai"  # openai, gemini
  embedding_model: "text-embedding-3-small"
  llm_model: "gpt-4o-mini"
  temperature: 0.3

indexing:
  directories:
    - "docs/"
    - "src/"
  extensions:
    - ".md"
    - ".txt"
    - ".py"
  exclude_patterns:
    - "node_modules/"
    - ".git/"

chunking:
  chunk_size: 1000
  chunk_overlap: 200

retrieval:
  top_k: 5

Commands

`docrag init`

Initialize DocRAG in current project with interactive wizard.

`docrag index`

Index project documents and create vector database.

`docrag reindex`

Rebuild vector database from scratch (useful after documentation changes).

`docrag config`

Display current configuration.

`docrag config --edit`

Open configuration file in default editor.

`docrag mcp-config`

Display MCP server configuration for Kiro integration.

`docrag doctor`

Diagnose installation and configuration issues. Checks:

DocRAG initialization
Configuration files
API keys
Vector database
Python environment
Required packages
MCP configuration

`docrag fix-prompt`

Fix prompt template to include required placeholders ({context} and {question}).

Use this command if answer_question tool returns only sources without AI-generated answer.

`docrag --version`

Display version information.

`docrag update`

Update DocRAG configuration and MCP server for existing projects. Use this after upgrading the package to get new features.

`docrag fix-database`

Fix database permission and corruption issues. Use this when encountering "readonly database" errors or other database problems.

`docrag --help`

Display help information.

Supported File Types

Markdown: .md
Text: .txt
Python: .py
PHP: .php
Swift: .swift
JSON: .json
YAML: .yaml, .yml
Config: .conf, .config, .ini

LLM Providers

OpenAI

Embeddings: text-embedding-3-small
LLM: gpt-4o-mini
Get API key: https://platform.openai.com/api-keys

Google Gemini

Embeddings: models/embedding-001
LLM: gemini-1.5-flash
Get API key: https://makersuite.google.com/app/apikey

Project Templates

Symfony

Optimized for Symfony PHP framework projects with expert knowledge of:

Symfony components and bundles
Doctrine ORM
Twig templates
PHP best practices

iOS

Optimized for iOS development projects with expert knowledge of:

Swift programming language
UIKit and SwiftUI
iOS SDK and frameworks
Xcode and development tools

General Documentation

General-purpose template for any project type.

Custom

Provide your own custom prompt template.

Security

CRITICAL WARNING: Never commit your .env file to git!

Your .env file contains sensitive API keys that provide access to paid services. If exposed, they can be used by others, potentially costing you money or compromising your accounts.

Automatic Security Features

DocRAG Kit automatically protects your API keys by:

Creating .docrag/.gitignore to exclude sensitive files (vectordb/, .env, *.pyc)
Checking if .env is in your root .gitignore
Offering to add .env to .gitignore if missing
Creating .env.example template without real keys
Displaying security warnings after initialization

Best Practices

Always keep .env in .gitignore
- DocRAG Kit checks this during initialization
- Verify with: grep .env .gitignore
Use .env.example as a template
- Share .env.example with your team (no real keys)
- Team members copy it to .env and add their own keys
Never share API keys
- Don't paste them in public issues or forums
- Don't commit them to public repositories
- Don't share them in chat or email
Rotate keys if exposed
- If you accidentally commit .env, revoke keys immediately
- Generate new keys from provider dashboard
- Update your .env file

What to Do If You Accidentally Commit API Keys

If you accidentally commit your .env file or API keys to git:

Revoke the exposed keys immediately:
- OpenAI: https://platform.openai.com/api-keys
- Google Gemini: https://makersuite.google.com/app/apikey
Generate new API keys from the provider dashboard
Update your .env file with the new keys

Remove keys from git history:

# Using git filter-branch (for small repos)
git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch .env" \
  --prune-empty --tag-name-filter cat -- --all

# Or use BFG Repo-Cleaner (recommended for large repos)
# https://rtyley.github.io/bfg-repo-cleaner/
bfg --delete-files .env

Force push to remote (if already pushed):

git push origin --force --all
git push origin --force --tags

Notify team members to re-clone the repository

Pre-commit Hook (Recommended)

Add a pre-commit hook to prevent accidentally committing .env:

# Create .git/hooks/pre-commit
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
if git diff --cached --name-only | grep -q "^\.env$"; then
    echo "ERROR: Attempting to commit .env file!"
    echo "This file contains sensitive API keys."
    echo "Add .env to .gitignore and try again."
    exit 1
fi
EOF

# Make it executable
chmod +x .git/hooks/pre-commit

Security Checklist

Before pushing to git, verify:

.env is in .gitignore
.env is not staged for commit (git status)
.env.example exists (without real keys)
.docrag/.gitignore excludes vectordb/ and .env
No API keys in configuration files
Pre-commit hook is installed (optional but recommended)

MCP Integration

DocRAG Kit provides four MCP tools for Kiro AI:

`search_docs` - Fast Fragment Search

Returns relevant document fragments with source files. Best for quick lookups.

Parameters:

question (string, required): Search query or topic
max_results (integer, optional): Number of results (1-10, default: 3)

Performance: ~1 second, no LLM tokens used

Example:

Question: "database configuration"
Response: 
🔍 Found 2 relevant document(s):

--- Result 1 ---
📄 Source: docs/config.md
Database settings in .env:
DB_HOST=localhost
DB_PORT=5432
...

`answer_question` - AI-Generated Answer

Returns comprehensive AI-generated answer synthesized from documentation. Best for complex questions.

Parameters:

question (string, required): Question to answer
include_sources (boolean, optional): Include source files (default: true)

Performance: ~3-5 seconds, uses LLM tokens

Example:

Question: "How do I configure the database?"
Response: "To configure the database, edit the .env file and set DB_HOST, DB_PORT, and DB_NAME..."

Sources:
  • docs/config.md
  • README.md

`list_indexed_docs`

List all indexed documents in the project.

Returns: List of all source files in the vector database.

`reindex_docs` - Smart Reindexing

Automatically detects document changes and performs intelligent reindexing. Best for keeping documentation up-to-date.

Parameters:

force (boolean, optional): Force full reindexing even if no changes detected (default: false)
check_only (boolean, optional): Only check if reindexing is needed without performing it (default: false)

Performance: Variable - fast check (~1s), full reindex depends on document count

Example:

# Check if reindexing is needed
reindex_docs(check_only=True)
Response: "Changes detected in 3 file(s): docs/api.md, README.md, src/config.py"

# Perform smart reindexing
reindex_docs()
Response: "Reindexing completed! Files processed: 15, Chunks created: 127"

# Force full reindexing
reindex_docs(force=True)
Response: "Force reindexing completed! Reason: Force reindexing requested"

Tool Selection Guide:

Use search_docs for quick lookups (faster, free)
Use answer_question for complex questions (slower, uses tokens)
Use reindex_docs when documents have been updated
Use list_indexed_docs to see what's currently indexed
See docs/AGENT_QUICK_START.md for detailed guide

Documentation

Quick Links

docs/AGENT_QUICK_START.md - Quick start guide for AI agents
docs/SECURITY.md - Complete security guide (read this first!)
docs/EXAMPLES.md - Detailed usage examples for different project types
docs/MCP_INTEGRATION.md - Complete guide for Kiro AI integration
docs/TROUBLESHOOTING.md - Solutions for common issues
docs/API_REFERENCE.md - Complete CLI and configuration reference

Examples

See docs/EXAMPLES.md for detailed usage examples including:

Symfony project setup
iOS project setup
General documentation project
Example questions and answers
Configuration examples

MCP Integration

See docs/MCP_INTEGRATION.md for complete integration guide:

Getting MCP configuration
Manual and automatic setup
Testing MCP server
Troubleshooting connection issues

Troubleshooting

See docs/TROUBLESHOOTING.md for detailed solutions to:

Installation issues
API key problems
Indexing errors
MCP connection issues
Performance optimization

For database-specific issues, see docs/DATABASE_TROUBLESHOOTING.md:

"Readonly database" errors
Database corruption
Permission issues
Lock file problems

Quick fixes:

Database Issues (Readonly Database, Corruption)

# Automatic fix for most database problems
docrag fix-database

# Manual fix if needed
rm -rf .docrag/vectordb && docrag index

Database Not Found

docrag index

API Key Errors

# Check .env file
cat .env
# Should show: OPENAI_API_KEY=sk-... or GOOGLE_API_KEY=...

MCP Connection Issues

# Verify MCP server exists
ls .docrag/mcp_server.py

# Test manually
python .docrag/mcp_server.py

Development

Setup Development Environment

git clone https://github.com/yourusername/docrag-kit.git
cd docrag-kit
pip install -e ".[dev]"

Run Tests

pytest

Run Property-Based Tests

pytest tests/property/

Code Formatting

black src/ tests/

Type Checking

mypy src/

Requirements

Python >= 3.8
OpenAI API key or Google Gemini API key
100MB+ disk space for vector database

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

Documentation

README.md - Main documentation
docs/ - Complete documentation
docs/SECURITY.md - Security best practices
docs/EXAMPLES.md - Usage examples
docs/MCP_INTEGRATION.md - MCP setup guide
docs/TROUBLESHOOTING.md - Troubleshooting guide
docs/API_REFERENCE.md - Complete API reference

Community

Issues: GitHub Issues
Discussions: GitHub Discussions

Changelog

0.2.0 (2024-12-22) - Architectural Solution for MCP Reindexing

MAJOR: Implemented isolated subprocess reindexing architecture for MCP compatibility
NEW: Added docrag.mcp_reindex_worker module for process-isolated database operations
NEW: Added docrag test-isolated-reindex command for testing new architecture
IMPROVEMENT: Multi-strategy reindexing (subprocess → in-process → fallback)
IMPROVEMENT: Enhanced VectorDBManager with force deletion capabilities
IMPROVEMENT: Better process isolation and connection cleanup
FIX: RESOLVED: MCP reindexing database lock errors through architectural changes
STATUS: ✅ IMPLEMENTED - Isolated subprocess architecture successfully resolves the issue
ARCHITECTURE: Subprocess isolation eliminates ChromaDB/SQLite WAL locking conflicts
COMPATIBILITY: Maintains full backward compatibility with existing functionality
This RESOLVES the persistent MCP reindexing issues reported in v0.1.8-0.1.9
TESTING: Use docrag test-isolated-reindex to verify the new architecture works

0.1.9 (2024-12-22) - Hotfix for MCP Reindexing

INVESTIGATION: Added comprehensive diagnostics for persistent MCP reindexing issues
NEW: Added docrag test-mcp-reindex command for detailed MCP reindexing diagnostics
IMPROVEMENT: Enhanced database deletion with 5 different strategies and aggressive cleanup
IMPROVEMENT: Better error messages acknowledging the ChromaDB/SQLite WAL locking issue
TRANSPARENCY: Clear documentation that this is a known ChromaDB limitation in MCP context
WORKAROUND: Documented hybrid workflow (MCP for search/answers, CLI for reindexing)
NOTE: Read operations (search_docs, answer_question) work perfectly in MCP
NOTE: Write operations (reindex_docs) require CLI workaround due to SQLite WAL locking

0.1.8 (2024-12-22)

NEW: Enhanced reindex_docs MCP tool with improved database lock handling
NEW: Added comprehensive database repair mechanisms for MCP compatibility
NEW: Added docrag fix-database command with automatic lock file removal and permission fixes
NEW: Added docrag debug-mcp command with detailed CLI/MCP synchronization diagnostics
FIX: CRITICAL: Fixed MCP reindexing database lock errors that prevented automated reindexing
FIX: Enhanced database deletion with retry mechanisms and connection cleanup
FIX: Improved MCP server error handling with helpful user guidance
IMPROVEMENT: Added automatic staleness warnings when documents may be outdated
IMPROVEMENT: Enhanced database operations with MCP-safe file handling
IMPROVEMENT: Added comprehensive upgrade documentation and troubleshooting guides
This resolves the critical issue where MCP reindexing failed with "unable to open database file" errors

0.1.7 (2024-12-22)

Internal development version with database improvements

0.1.6 (2024-12-14)

NEW: Added reindex_docs MCP tool for smart reindexing with automatic change detection
NEW: Added docrag debug-mcp command to diagnose CLI vs MCP synchronization issues
NEW: Added docrag fix-database command for database permission and corruption issues
NEW: Added docrag update command for upgrading existing projects
FIX: Fixed critical CLI/MCP synchronization issue where different databases were accessed
IMPROVEMENT: Enhanced MCP configuration with correct working directory paths
IMPROVEMENT: Added comprehensive documentation for troubleshooting
IMPROVEMENT: Added debug logging to MCP server for path diagnostics
This resolves issues where CLI shows many documents but MCP shows only few

0.1.5 (2024-12-09)

FIX: Added docrag fix-prompt command to fix prompt templates missing required placeholders
FIX: Added validation for prompt template placeholders ({context} and {question})
IMPROVEMENT: Better error messages when prompt template is invalid
This fixes the issue where answer_question tool returns only sources without AI-generated answer

0.1.4 (2024-12-09)

NEW: Added answer_question MCP tool for AI-generated comprehensive answers
Split search_docs into two distinct tools:
- search_docs: Fast semantic search returning document fragments (no LLM, ~1s)
- answer_question: AI-generated comprehensive answers (uses LLM, ~3-5s)
All three MCP tools now available: search_docs, answer_question, list_indexed_docs
Improved tool descriptions and parameter schemas

0.1.3 (2024-12-09)

Skipped due to packaging issue

0.1.2 (2024-12-09)

Skipped due to packaging issue

0.1.1 (2024-12-09)

Fixed GitHub Actions permissions for automated releases
Updated artifact actions to v4
Improved CI/CD pipeline

0.1.0 (2024-12-09)

Initial release with core functionality
Support for OpenAI and Gemini providers
MCP integration for Kiro AI
Interactive setup wizard
Project templates (Symfony, iOS, General)
Doctor command for diagnostics
Automatic project structure detection

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Dec 22, 2025

0.1.9

Dec 22, 2025

0.1.8

Dec 22, 2025

0.1.7

Dec 14, 2025

0.1.6

Dec 14, 2025

0.1.5

Dec 9, 2025

0.1.4

Dec 9, 2025

0.1.3

Dec 9, 2025

0.1.2

Dec 9, 2025

0.1.1

Dec 9, 2025

0.1.0

Dec 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docrag_kit-0.2.0.tar.gz (95.3 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docrag_kit-0.2.0-py3-none-any.whl (54.4 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file docrag_kit-0.2.0.tar.gz.

File metadata

Download URL: docrag_kit-0.2.0.tar.gz
Upload date: Dec 22, 2025
Size: 95.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docrag_kit-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c92a2928081d27a8d8a4feaa6bfc3339263e116dce014a97e1e4cf3a9823c19f`
MD5	`5d996f9158ea537e2beeb196b67e0b98`
BLAKE2b-256	`818da208b0149236797ae27cfc7439b3d46f602e5bdd1b336352450bb2187c88`

See more details on using hashes here.

File details

Details for the file docrag_kit-0.2.0-py3-none-any.whl.

File metadata

Download URL: docrag_kit-0.2.0-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 54.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docrag_kit-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26185a6fcdfa3dcb4ab228a7832f9252517d838b235a9b7e495651356e58feac`
MD5	`34c45801c73e067e51f67d3ba763dffb`
BLAKE2b-256	`017b736de3322a54cde27200746c4510a0cee2f0ebcfe0f91c7104b7412f190a`

See more details on using hashes here.

docrag-kit 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DocRAG Kit

Features

Installation

Requirements

From PyPI

From Source

Troubleshooting Installation

Quick Start

1. Initialize RAG System

2. Index Your Documentation

3. Connect to Kiro AI

4. Start Searching

Configuration

Configuration File

Commands

docrag init

docrag index

docrag reindex

docrag config

docrag config --edit

docrag mcp-config

docrag doctor

docrag fix-prompt

docrag --version

docrag update

docrag fix-database

docrag --help

Supported File Types

LLM Providers

OpenAI

Google Gemini

Project Templates

Symfony

iOS

General Documentation

Custom

Security

Automatic Security Features

Best Practices

What to Do If You Accidentally Commit API Keys

Pre-commit Hook (Recommended)

Security Checklist

MCP Integration

search_docs - Fast Fragment Search

answer_question - AI-Generated Answer

list_indexed_docs

reindex_docs - Smart Reindexing

Documentation

Quick Links

Examples

MCP Integration

Troubleshooting

Development

Setup Development Environment

Run Tests

Run Property-Based Tests

Code Formatting

Type Checking

Requirements

License

Contributing

Support

Documentation

Community

Changelog

0.2.0 (2024-12-22) - Architectural Solution for MCP Reindexing

0.1.9 (2024-12-22) - Hotfix for MCP Reindexing

0.1.8 (2024-12-22)

0.1.7 (2024-12-22)

0.1.6 (2024-12-14)

0.1.5 (2024-12-09)

`docrag init`

`docrag index`

`docrag reindex`

`docrag config`

`docrag config --edit`

`docrag mcp-config`

`docrag doctor`

`docrag fix-prompt`

`docrag --version`

`docrag update`

`docrag fix-database`

`docrag --help`

`search_docs` - Fast Fragment Search

`answer_question` - AI-Generated Answer

`list_indexed_docs`

`reindex_docs` - Smart Reindexing