Skip to main content

RecallAI is a cutting-edge Retrieval-Augmented Generation (RAG) framework designed for Large Language Models (LLMs). It enhances LLM responses by integrating real-time knowledge retrieval from structured and unstructured data sources.

Project description

RecallAIsh: A Retrieval-Augmented Generation (RAG) Framework

RecallAIsh is a comprehensive Python package designed to easily add Retrieval-Augmented Generation capabilities to your applications. It seamlessly integrates real-time knowledge retrieval with Large Language Model (LLM) responses to deliver context-aware, accurate, and dynamic results.

Table of Contents

Overview

RecallAIsh leverages the power of state-of-the-art retrieval methods combined with LLMs, allowing you to enrich your text generation workflows with up-to-date and contextually relevant information. With built-in support for various document sources, dynamic web content scraping, and flexible vector storage solutions such as Qdrant, Pinecone, and MongoDB, this package is ideal for projects ranging from smart document QA systems to advanced conversational agents.

Features

  • Retrieval-Augmented Generation (RAG): Combine real-time data retrieval with LLM responses for informed outputs.
  • Plug-and-Play Integration: Easily integrate with GPT-based models and other LLMs for powerful natural language understanding.
  • Vector Storage Solutions: Built-in support for Qdrant, Pinecone, and MongoDB for efficient document embedding storage and retrieval.
  • Multi-Source Ingestion: Ingest content from PDFs, web pages via integrated web scrapers, and additional document sources.
  • Custom Prompt Management: Create tailored prompts with context-rich information to steer LLM responses.
  • Modular Pipeline: Extend or modify components according to your project requirements.

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • A vector database: Qdrant, Pinecone, or MongoDB
  • An OpenAI API key

Install via PyPI

Install RecallAI using the Python Package Index:

pip install RecallAIsh

If you plan to use MongoDB for storing vectors, install the optional dependencies:

pip install RecallAIsh[mongodb]

Manual Installation

  1. Clone the repository:
    git clone https://github.com/AshishChandpa/RecallAI.git
    cd RecallAI
    
  2. Install the dependencies:
    pip install -r requirements.txt
    
  3. Create a .env file in the project root and add your OpenAI API key:
    OPENAI_API_KEY=your_openai_api_key
    
  4. Ensure your vector database is up and running. For example, start Qdrant:
    docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
    
    Or, set up your MongoDB instance accordingly.

Usage

Running an Example

See RecallAI in action with the provided example script:

python examples/example.py

Integrating RecallAIsh into Your Project

Below is a detailed example highlighting primary components, including the new MongoDB vector store and web scraper integration:

import os

from RecallAIsh.document_loaders.web_loader import WebDocumentLoader
from RecallAIsh.prompt_manager import PromptManager
from RecallAIsh.rag_system import RAGSystem
from RecallAIsh.vector_store.mongodb_store import MongoDBVectorStore
from RecallAIsh.vector_store.qdrant_store import QdrantVectorStore

# Example: Connecting using Qdrant
qdrant_store = QdrantVectorStore(
    url="http://localhost:6333",
    collection_name="my_rag_collection",
    vector_size=1536,  # Adjust to match your chosen embedding dimension
)

# Example: Connecting using MongoDB
mongodb_store = MongoDBVectorStore(
    uri="<MongoAtlasURL>",
    database="recallai_db",
    collection="vector_store",
    vector_size=1536,  # Adjust to match your embedding dimension
)

# Initialize the Retrieval-Augmented Generation system with your preferred vector store
rag_system = RAGSystem(
    vector_store=mongodb_store,  # or qdrant_store if preferred
    vector_namespace="default_namespace",
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)

# Retrieve relevant documents based on the user's query
user_query = "Summarize the latest news on technology."
# First, use the web scraper to fetch dynamic content from the web
doc = WebDocumentLoader().load(url="https://news.example.com/technology")
# Store processed web content into the vector store as needed
rag_system.ingestion_pipeline([doc])

# Retrieve documents including the freshly scraped web content
context = rag_system.retrieve_documents(user_query, source="all")

# Define custom instructions and generate the full prompt
instructions = "You are an expert assistant tasked with summarizing complex technical news."
prompt_manager = PromptManager(instructions=instructions)
full_prompt = prompt_manager.create_prompt(context, user_query)

# Generate answer using the RAG system
response = rag_system.chat(full_prompt, model="gpt-4o-mini")
print("Answer:", response)

Configuration

  • Customize vector store parameters such as collection name and embedding dimensions as needed.
  • Extend the ingestion pipeline to incorporate additional document formats, web scraping, or data sources.
  • Adjust the prompt management module to refine how context and instructions are combined for your specific application.

Web Scraper Integration

RecallAI now includes a web scraper module which leverages standard libraries like BeautifulSoup and requests. This allows you to dynamically ingest web content:

  • Configure the scraper with custom parameters such as user-agent, timeout, and parsing criteria.
  • Automatically process and clean HTML content before storing it in your chosen vector store.

Contributing

Contributions are welcome! To contribute:

  1. Open an issue for discussion or report a bug.
  2. Submit a pull request with your improvements.
  3. Follow the coding standards and ensure tests pass before submission.

License

RecallAI is available under the MIT License. See the LICENSE file for more details.

Contact

For further questions or feedback, please contact via email: chandpa.ashish007@gmail.com.

Happy coding with RecallAI!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recallaish-0.2.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recallaish-0.2.1-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file recallaish-0.2.1.tar.gz.

File metadata

  • Download URL: recallaish-0.2.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for recallaish-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1f64556eba0b0a6e3f1bb8f36d2de0256a7bac40a98e452c17d83385a1e72846
MD5 6c3f8c6949e933ab6f81f2d76c0cbbc4
BLAKE2b-256 aa13ac8b15bbccce3977a6b120c7ae880e236156946797467e4a41e518a87859

See more details on using hashes here.

File details

Details for the file recallaish-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: recallaish-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for recallaish-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 060703e4f15c6978cb1b2e5977fb7e99cb43ab145447115b38c226dd22ed0059
MD5 d8039e088258cd4632f10a30949f8c47
BLAKE2b-256 748b30c2c0f42cb17498f798f089462d7d4dd479d2d0c63b56a505ff3150d6d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page