Skip to main content

No project description provided

Project description

Ceylon AI RAG Framework

A powerful, modular, and extensible Retrieval-Augmented Generation (RAG) framework built with Python, supporting multiple LLM providers, embedders, and document types.

🌟 Features

  • Multiple Document Types: Support for various document formats including:

    • Text files (with extensive format support)
    • PDF documents
    • Images (with OCR capabilities)
    • Source code files
  • Flexible Architecture:

    • Modular component design
    • Pluggable LLM providers (OpenAI, Ollama)
    • Extensible embedding providers
    • Vector store integration (LanceDB)
  • Advanced RAG Capabilities:

    • Intelligent document chunking
    • Context-aware searching
    • Query expansion and reranking
    • Metadata enrichment
    • Source attribution
  • Specialized RAG Implementations:

    • FolderRAG: Process and analyze entire directory structures
    • CodeAnalysisRAG: Specialized for source code understanding
    • SimpleRAG: Basic RAG implementation for text data
    • Support for custom RAG implementations

🚀 Getting Started

Installation

# Install via pip
pip install ceylon-rag

# Or install from source
git clone https://github.com/ceylonai/ceylon-rag.git
cd ceylon-rag
pip install -e .

Basic Usage

Here's a simple example using the framework:

import asyncio
from dotenv import load_dotenv
from ceylon_rag import SimpleRAG

async def main():
    # Load environment variables
    load_dotenv()

    # Configure the RAG system
    config = {
        "llm": {
            "type": "openai",
            "model_name": "gpt-4",
            "api_key": os.getenv("OPENAI_API_KEY")
        },
        "embedder": {
            "type": "openai",
            "model_name": "text-embedding-3-small",
            "api_key": os.getenv("OPENAI_API_KEY")
        },
        "vector_store": {
            "type": "lancedb",
            "db_path": "./data/lancedb",
            "table_name": "documents"
        }
    }

    # Initialize RAG
    rag = SimpleRAG(config)
    await rag.initialize()

    try:
        # Process your documents
        documents = await rag.process_documents("path/to/documents")
        
        # Query the system
        result = await rag.query("What are the main topics in these documents?")
        print(result.response)
        
    finally:
        await rag.close()

if __name__ == "__main__":
    asyncio.run(main())

🏗️ Architecture

Core Components

  1. Document Loaders

    • TextLoader: Handles text-based files
    • PDFLoader: Processes PDF documents
    • ImageLoader: Handles images with OCR
    • Extensible base class for custom loaders
  2. Embedders

    • OpenAI embeddings support
    • Ollama embeddings support
    • Modular design for adding new providers
  3. LLM Providers

    • OpenAI integration
    • Ollama integration
    • Async interface for all providers
  4. Vector Store

    • LanceDB integration
    • Efficient vector similarity search
    • Metadata storage and retrieval

Document Processing

The framework provides sophisticated document processing capabilities:

# Example: Processing a code repository
async def analyze_codebase():
    config = {
        "llm": {
            "type": "openai",
            "model_name": "gpt-4"
        },
        "embedder": {
            "type": "openai",
            "model_name": "text-embedding-3-small"
        },
        "vector_store": {
            "type": "lancedb",
            "db_path": "./data/lancedb",
            "table_name": "code_documents"
        },
        "chunk_size": 1000,
        "chunk_overlap": 200
    }

    rag = CodeAnalysisRAG(config)
    await rag.initialize()
    
    documents = await rag.process_codebase("./src")
    await rag.index_code(documents)
    
    result = await rag.analyze_code(
        "Explain the main architecture of this codebase"
    )
    print(result.response)

🔧 Advanced Configuration

File Exclusions

Configure file exclusions using patterns:

config = {
    # ... other config options ...
    "excluded_dirs": [
        "venv",
        "node_modules",
        ".git",
        "__pycache__"
    ],
    "excluded_files": [
        ".env",
        "package-lock.json"
    ],
    "excluded_extensions": [
        ".pyc",
        ".pyo",
        ".pyd"
    ],
    "ignore_file": ".ragignore"  # Similar to .gitignore
}

Chunking Configuration

Customize document chunking:

config = {
    # ... other config options ...
    "chunk_size": 1000,  # Characters per chunk
    "chunk_overlap": 200,  # Overlap between chunks
}

🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests. For major changes, please open an issue first to discuss what you would like to change.

📄 License

MIT License

🙏 Acknowledgments

  • OpenAI for GPT and embedding models
  • Ollama for local LLM support
  • LanceDB team for vector storage
  • All contributors and users of the framework

📚 API Documentation

For detailed API documentation, please visit our API Documentation page.

🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ceylon_rag-0.3.0.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ceylon_rag-0.3.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file ceylon_rag-0.3.0.tar.gz.

File metadata

  • Download URL: ceylon_rag-0.3.0.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.6 Windows/11

File hashes

Hashes for ceylon_rag-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6eafadf525116e9bab96a852715d8bbf29ff40c1244e3d8be97c8dd055641a6b
MD5 8cc5760ae522defb41f87ee1f732e5ec
BLAKE2b-256 111c0b702f2c00797ea95c5242a24d5ea7e48e75f7ad38626f35107740fb62a8

See more details on using hashes here.

File details

Details for the file ceylon_rag-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ceylon_rag-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.6 Windows/11

File hashes

Hashes for ceylon_rag-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e5efd8e292ff98c5da64b4a33b4e5bef0074505b3b0c0f734cf814791d283b4
MD5 1ac56a7e7eee7450cb43820f1d7cbfaf
BLAKE2b-256 f42430de5360cbfa1fe015eae7b34fcfc83fc6f41ca15ec3c81e41b50e76d7e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page