Skip to main content

Universal RAG system for project documentation

Project description

DocRAG Kit

Tests Python License

Universal RAG (Retrieval-Augmented Generation) system for project documentation. Quickly add AI-powered semantic search to any project.

Features

  • 🚀 Quick Setup - Initialize RAG system in any project with one command
  • 📚 Universal - Works with any documentation (Markdown, code, configs)
  • 🔌 MCP Integration - Seamless integration with Kiro AI via Model Context Protocol
  • 🌍 Multilingual - Supports Russian and English questions and answers
  • 🎯 Project Templates - Predefined templates for Symfony, iOS, and general projects
  • 🔒 Secure - API keys stored safely in .env files

Installation

Requirements

  • Python >= 3.10 (3.11 recommended)
  • pip >= 21.0

From PyPI

pip install docrag-kit

From Source

git clone https://github.com/dexiusprime-oss/docrag-kit.git
cd docrag-kit
pip install -e .

Troubleshooting Installation

If you encounter dependency conflicts with onnxruntime or pulsar-client:

# Use Python 3.11
python3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install docrag-kit

For more solutions, see docs/TROUBLESHOOTING.md

Quick Start

1. Initialize RAG System

Navigate to your project directory and run:

docrag init

This will:

  • Start an interactive configuration wizard
  • Ask for your LLM provider (OpenAI or Gemini)
  • Request your API key
  • Configure directories and file types to index
  • Create .docrag/ directory with configuration

2. Index Your Documentation

docrag index

This will:

  • Scan configured directories for documentation
  • Split documents into chunks
  • Create embeddings using your chosen LLM provider
  • Store vectors in local ChromaDB database

3. Connect to Kiro AI

docrag mcp-config

This will display the MCP server configuration to add to Kiro.

4. Start Searching

Once configured in Kiro, you can ask questions about your project:

  • "What is the architecture of this project?"
  • "How do I configure the database?"
  • "What APIs are available?"

Configuration

After initialization, your project will have:

your-project/
├── .docrag/
│   ├── config.yaml      # Configuration file
│   ├── mcp_server.py    # MCP server for Kiro
│   ├── vectordb/        # Vector database (gitignored)
│   └── .gitignore       # Excludes vectordb and .env
└── .env                 # API keys (gitignored)

Configuration File

.docrag/config.yaml contains all settings:

project:
  name: "my-project"
  type: "symfony"  # symfony, ios, general, custom

llm:
  provider: "openai"  # openai, gemini
  embedding_model: "text-embedding-3-small"
  llm_model: "gpt-4o-mini"
  temperature: 0.3

indexing:
  directories:
    - "docs/"
    - "src/"
  extensions:
    - ".md"
    - ".txt"
    - ".py"
  exclude_patterns:
    - "node_modules/"
    - ".git/"

chunking:
  chunk_size: 1000
  chunk_overlap: 200

retrieval:
  top_k: 5

Commands

docrag init

Initialize DocRAG in current project with interactive wizard.

docrag index

Index project documents and create vector database.

docrag reindex

Rebuild vector database from scratch (useful after documentation changes).

docrag config

Display current configuration.

docrag config --edit

Open configuration file in default editor.

docrag mcp-config

Display MCP server configuration for Kiro integration.

docrag doctor

Diagnose installation and configuration issues. Checks:

  • DocRAG initialization
  • Configuration files
  • API keys
  • Vector database
  • Python environment
  • Required packages
  • MCP configuration

docrag --version

Display version information.

docrag --help

Display help information.

Supported File Types

  • Markdown: .md
  • Text: .txt
  • Python: .py
  • PHP: .php
  • Swift: .swift
  • JSON: .json
  • YAML: .yaml, .yml
  • Config: .conf, .config, .ini

LLM Providers

OpenAI

Google Gemini

Project Templates

Symfony

Optimized for Symfony PHP framework projects with expert knowledge of:

  • Symfony components and bundles
  • Doctrine ORM
  • Twig templates
  • PHP best practices

iOS

Optimized for iOS development projects with expert knowledge of:

  • Swift programming language
  • UIKit and SwiftUI
  • iOS SDK and frameworks
  • Xcode and development tools

General Documentation

General-purpose template for any project type.

Custom

Provide your own custom prompt template.

Security

⚠️ CRITICAL WARNING: Never commit your .env file to git!

Your .env file contains sensitive API keys that provide access to paid services. If exposed, they can be used by others, potentially costing you money or compromising your accounts.

Automatic Security Features

DocRAG Kit automatically protects your API keys by:

  • Creating .docrag/.gitignore to exclude sensitive files (vectordb/, .env, *.pyc)
  • Checking if .env is in your root .gitignore
  • Offering to add .env to .gitignore if missing
  • Creating .env.example template without real keys
  • Displaying security warnings after initialization

Best Practices

  1. Always keep .env in .gitignore

    • DocRAG Kit checks this during initialization
    • Verify with: grep .env .gitignore
  2. Use .env.example as a template

    • Share .env.example with your team (no real keys)
    • Team members copy it to .env and add their own keys
  3. Never share API keys

    • Don't paste them in public issues or forums
    • Don't commit them to public repositories
    • Don't share them in chat or email
  4. Rotate keys if exposed

    • If you accidentally commit .env, revoke keys immediately
    • Generate new keys from provider dashboard
    • Update your .env file

What to Do If You Accidentally Commit API Keys

If you accidentally commit your .env file or API keys to git:

  1. Revoke the exposed keys immediately:

  2. Generate new API keys from the provider dashboard

  3. Update your .env file with the new keys

  4. Remove keys from git history:

    # Using git filter-branch (for small repos)
    git filter-branch --force --index-filter \
      "git rm --cached --ignore-unmatch .env" \
      --prune-empty --tag-name-filter cat -- --all
    
    # Or use BFG Repo-Cleaner (recommended for large repos)
    # https://rtyley.github.io/bfg-repo-cleaner/
    bfg --delete-files .env
    
  5. Force push to remote (if already pushed):

    git push origin --force --all
    git push origin --force --tags
    
  6. Notify team members to re-clone the repository

Pre-commit Hook (Recommended)

Add a pre-commit hook to prevent accidentally committing .env:

# Create .git/hooks/pre-commit
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
if git diff --cached --name-only | grep -q "^\.env$"; then
    echo "ERROR: Attempting to commit .env file!"
    echo "This file contains sensitive API keys."
    echo "Add .env to .gitignore and try again."
    exit 1
fi
EOF

# Make it executable
chmod +x .git/hooks/pre-commit

Security Checklist

Before pushing to git, verify:

  • .env is in .gitignore
  • .env is not staged for commit (git status)
  • .env.example exists (without real keys)
  • .docrag/.gitignore excludes vectordb/ and .env
  • No API keys in configuration files
  • Pre-commit hook is installed (optional but recommended)

MCP Integration

DocRAG Kit provides three MCP tools for Kiro AI:

search_docs - Fast Fragment Search

Returns relevant document fragments with source files. Best for quick lookups.

Parameters:

  • question (string, required): Search query or topic
  • max_results (integer, optional): Number of results (1-10, default: 3)

Performance: ~1 second, no LLM tokens used

Example:

Question: "database configuration"
Response: 
🔍 Found 2 relevant document(s):

--- Result 1 ---
📄 Source: docs/config.md
Database settings in .env:
DB_HOST=localhost
DB_PORT=5432
...

answer_question - AI-Generated Answer

Returns comprehensive AI-generated answer synthesized from documentation. Best for complex questions.

Parameters:

  • question (string, required): Question to answer
  • include_sources (boolean, optional): Include source files (default: true)

Performance: ~3-5 seconds, uses LLM tokens

Example:

Question: "How do I configure the database?"
Response: "To configure the database, edit the .env file and set DB_HOST, DB_PORT, and DB_NAME..."

📚 Sources:
  • docs/config.md
  • README.md

list_indexed_docs

List all indexed documents in the project.

Returns: List of all source files in the vector database.

Tool Selection Guide:

  • Use search_docs for quick lookups (faster, free)
  • Use answer_question for complex questions (slower, uses tokens)
  • See docs/AGENT_QUICK_START.md for detailed guide

Documentation

Quick Links

Examples

See docs/EXAMPLES.md for detailed usage examples including:

  • Symfony project setup
  • iOS project setup
  • General documentation project
  • Example questions and answers
  • Configuration examples

MCP Integration

See docs/MCP_INTEGRATION.md for complete integration guide:

  • Getting MCP configuration
  • Manual and automatic setup
  • Testing MCP server
  • Troubleshooting connection issues

Troubleshooting

See docs/TROUBLESHOOTING.md for detailed solutions to:

  • Installation issues
  • API key problems
  • Indexing errors
  • MCP connection issues
  • Performance optimization

Quick fixes:

Database Not Found

docrag index

API Key Errors

# Check .env file
cat .env
# Should show: OPENAI_API_KEY=sk-... or GOOGLE_API_KEY=...

MCP Connection Issues

# Verify MCP server exists
ls .docrag/mcp_server.py

# Test manually
python .docrag/mcp_server.py

Development

Setup Development Environment

git clone https://github.com/yourusername/docrag-kit.git
cd docrag-kit
pip install -e ".[dev]"

Run Tests

pytest

Run Property-Based Tests

pytest tests/property/

Code Formatting

black src/ tests/

Type Checking

mypy src/

Requirements

  • Python >= 3.8
  • OpenAI API key or Google Gemini API key
  • 100MB+ disk space for vector database

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

Documentation

Community

Changelog

0.1.3 (2024-12-09)

  • NEW: Added answer_question MCP tool for AI-generated comprehensive answers
  • All three MCP tools now available: search_docs, answer_question, list_indexed_docs
  • Fixed: v0.1.2 was missing the answer_question tool

0.1.2 (2024-12-09)

  • Skipped due to packaging issue

0.1.1 (2024-12-09)

  • Fixed GitHub Actions permissions for automated releases
  • Updated artifact actions to v4
  • Improved CI/CD pipeline

0.1.0 (2024-12-09)

  • Initial release with core functionality
  • Support for OpenAI and Gemini providers
  • MCP integration for Kiro AI
  • Interactive setup wizard
  • Project templates (Symfony, iOS, General)
  • Doctor command for diagnostics
  • Automatic project structure detection

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docrag_kit-0.1.3.tar.gz (59.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docrag_kit-0.1.3-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file docrag_kit-0.1.3.tar.gz.

File metadata

  • Download URL: docrag_kit-0.1.3.tar.gz
  • Upload date:
  • Size: 59.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docrag_kit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 85c9e651763d1dcf65485c97b0b58750571d7f4f88a2d79e04395746bb65d652
MD5 5fe3a4283be17ec861bf9f0e329fbafe
BLAKE2b-256 037be1bfa2e34dc517a97afeb829de587f29d1e469e86de2e55b3102019c9ab9

See more details on using hashes here.

File details

Details for the file docrag_kit-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: docrag_kit-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docrag_kit-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8b32802056dfe897bee18ede1e616ee7496882ddd7b514f3b4dd75072d9bf975
MD5 c388a9060c46509bb87f83758f9dc468
BLAKE2b-256 62425af7cf10c49f940cecdd3e6fd6014515452ba53d01dcbb6702586df96dfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page