Skip to main content

A simple, clean Python library for Retrieval-Augmented Generation (RAG)

Project description

๐Ÿงพ Ragify

A simple, clean Python library to abstract away the complexity of Retrieval-Augmented Generation (RAG) by allowing developers to create embeddings and retrieve them using minimal setup.

Python 3.8+ License: MIT PyPI version

๐Ÿš€ Quick Start

Method 1: Initialize with Configuration Dictionary

from ragify import KaliRAG

# Initialize with your configuration
config = {
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
    "db_config": {
        "api_key": "your-quadrant-api-key",
        "host": "https://api.quadrant.io",
        "collection": "my_docs"
    }
}

rag = KaliRAG(**config)

# Create and store embeddings
text = "Your long document text here..."
result = rag.create_store_embedding(text)

# Retrieve relevant chunks
response = rag.retrieve_embedding("What is the main topic?", top_k=3)
print(response)

Method 2: Configure Separately (Recommended)

from ragify import KaliRAG

# Initialize with defaults
rag = KaliRAG()

# Configure database with separate parameters
rag.configure_database(
    api_key="your-quadrant-api-key",
    host="https://api.quadrant.io",
    port=443,  # Optional
    collection="my_docs"
)

# Configure embedding model
rag.configure_embedding_model("sentence-transformers/all-MiniLM-L6-v2")

# Configure chunking parameters
rag.configure_chunking(chunk_size=512, chunk_overlap=50)

# Now use the configured RAG system
result = rag.create_store_embedding("Your text here...")

๐Ÿ“ฆ Installation

pip install ragify

Or install from source:

git clone https://github.com/ragify/ragify.git
cd ragify
pip install -e .

๐ŸŽฏ Features

โœ… Core Features

  • Simple API: Two main functions - create_store_embedding() and retrieve_embedding()
  • Smart Chunking: Automatic text chunking with configurable size and overlap
  • Multiple Embedding Models: Support for HuggingFace SentenceTransformers
  • Vector Database Integration: Native support for Quadrant (with mock mode for testing)
  • Configurable: Easy customization of chunking, embedding, and retrieval parameters

๐Ÿ”ง Advanced Features

  • Recursive Chunking: Automatically handles very long documents
  • Similarity Thresholds: Filter results by similarity score
  • Comprehensive Logging: Built-in logging for debugging and monitoring
  • Error Handling: Robust error handling with detailed error messages
  • Mock Mode: Works without external dependencies for testing

๐Ÿ“š Usage Examples

Basic Usage

from ragify import KaliRAG

# Initialize with defaults
rag = KaliRAG()

# Add your documents
text = """
RAG stands for Retrieval-Augmented Generation. It's a technique that combines 
large language models with external knowledge retrieval to provide more accurate 
and contextually relevant responses.
"""

# Create embeddings and store them
result = rag.create_store_embedding(text)
print(f"Created {result['chunks_created']} chunks")

# Query your knowledge base
response = rag.retrieve_embedding("What is RAG?", top_k=3)
for result in response['results']:
    print(f"Score: {result['similarity_score']:.3f}")
    print(f"Text: {result['text']}")

Advanced Configuration

# Custom configuration
config = {
    "embedding_model": "sentence-transformers/all-mpnet-base-v2",
    "chunk_size": 256,
    "chunk_overlap": 25,
    "db_config": {
        "api_key": "your-api-key",
        "collection": "custom_collection"
    }
}

rag = KaliRAG(**config)

# Use recursive chunking for very long documents
long_text = "..." * 1000  # Very long text
result = rag.create_store_embedding(
    long_text,
    use_recursive_chunking=True,
    max_recursion_depth=3
)

# Retrieve with custom parameters
response = rag.retrieve_embedding(
    "Your query",
    top_k=5,
    similarity_threshold=0.8
)

System Information

# Get system configuration
info = rag.get_info()
print(f"Embedding model: {info['embedding_model']}")
print(f"Embedding dimension: {info['embedding_dimension']}")
print(f"Chunk size: {info['chunk_size']}")

๐Ÿ—๏ธ Architecture

Ragify is built with a modular architecture:

ragify/
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ ragify.py          # Main KaliRAG class
โ”‚   โ”œโ”€โ”€ chunker.py         # Text chunking logic
โ”‚   โ”œโ”€โ”€ embedder.py        # Embedding model management
โ”‚   โ””โ”€โ”€ db_quadrant.py     # Vector database integration
โ”œโ”€โ”€ utils/
โ”‚   โ””โ”€โ”€ logger.py          # Logging utilities
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ defaults.py        # Default configurations
โ””โ”€โ”€ examples/
    โ””โ”€โ”€ basic_usage.py     # Usage examples

๐Ÿ”ง Configuration

Embedding Models

Supported models (via HuggingFace SentenceTransformers):

  • sentence-transformers/all-MiniLM-L6-v2 (default)
  • sentence-transformers/all-mpnet-base-v2
  • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Chunking Parameters

  • chunk_size: Maximum size of each chunk (default: 512)
  • chunk_overlap: Overlap between consecutive chunks (default: 50)

Retrieval Parameters

  • top_k: Number of top results to return (default: 3)
  • similarity_threshold: Minimum similarity score (default: 0.7)

๐Ÿงช Testing

Run the test suite:

python -m pytest tests/

Run with coverage:

python -m pytest tests/ --cov=ragify

๐Ÿ–ฅ๏ธ Command Line Interface

Ragify includes a CLI for easy configuration and usage:

Configure Settings

# Configure database
ragify config --api-key "your-key" --host "https://api.quadrant.io" --collection "my_docs"

# Configure with port
ragify config --api-key "your-key" --host "https://api.quadrant.io" --port 443 --collection "my_docs"

# Configure embedding model
ragify config --model "sentence-transformers/all-mpnet-base-v2"

# Configure chunking
ragify config --chunk-size 256 --chunk-overlap 25

# Configure everything at once
ragify config --api-key "your-key" --model "all-MiniLM-L6-v2" --chunk-size 512

Get System Information

ragify info

Create Embeddings from File

ragify create --input document.txt --output results.json

Query the Knowledge Base

ragify query "What is RAG?" --top-k 5 --threshold 0.8

Reset Database

ragify reset

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/ragify/ragify.git
cd ragify
pip install -e ".[dev]"

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • HuggingFace for the excellent SentenceTransformers library
  • Quadrant for the vector database integration
  • The open-source AI community for inspiration and feedback

๐Ÿ“ž Support

๐Ÿš€ Roadmap

  • OpenAI embedding support
  • Additional vector databases (Chroma, FAISS, Pinecone)
  • File loaders (PDF, CSV, DOCX)
  • CLI interface
  • Web UI
  • FastAPI wrapper
  • Batch processing
  • Advanced chunking strategies

Made with โค๏ธ by the Ragify Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragify_lib-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragify_lib-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file ragify_lib-0.1.0.tar.gz.

File metadata

  • Download URL: ragify_lib-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for ragify_lib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3eab6ed507b39eff362668176269a1e79f730d4858e54a1e23e52395969d1e86
MD5 942d948d1ccb7f3ec663e64b1ff8a72f
BLAKE2b-256 8f2c279cba1a66452e80f2200ff464d71454237ceef77d34bdbe6e5f437afb3a

See more details on using hashes here.

File details

Details for the file ragify_lib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragify_lib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for ragify_lib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2df6b5dac7d28d5a54b70594757e472531473f7105c490b2e20eff3bc8a748d9
MD5 430250e4a94151ef6ce70a910d8deaa7
BLAKE2b-256 dc6b6217e16f2ad872fcf1a2d453999cba629f674fe27658fb57cc398fc83bd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page