Skip to main content

RecallAI is a cutting-edge Retrieval-Augmented Generation (RAG) framework designed for Large Language Models (LLMs). It enhances LLM responses by integrating real-time knowledge retrieval from structured and unstructured data sources.

Project description

RecallAIsh: A Retrieval-Augmented Generation (RAG) Framework

RecallAIsh is a comprehensive Python package designed to easily add Retrieval-Augmented Generation capabilities to your applications. It seamlessly integrates real-time knowledge retrieval with Large Language Model (LLM) responses to deliver context-aware, accurate, and dynamic results.

Table of Contents

Overview

RecallAIsh leverages the power of state-of-the-art retrieval methods combined with LLMs, allowing you to enrich your text generation workflows with up-to-date and contextually relevant information. With built-in support for various document sources, dynamic web content scraping, and flexible vector storage solutions such as Qdrant, Pinecone, and MongoDB, this package is ideal for projects ranging from smart document QA systems to advanced conversational agents.

Features

  • Retrieval-Augmented Generation (RAG): Combine real-time data retrieval with LLM responses for informed outputs.
  • Plug-and-Play Integration: Easily integrate with GPT-based models and other LLMs for powerful natural language understanding.
  • Vector Storage Solutions: Built-in support for Qdrant, Pinecone, and MongoDB for efficient document embedding storage and retrieval.
  • Multi-Source Ingestion: Ingest content from PDFs, web pages via integrated web scrapers, and additional document sources.
  • Custom Prompt Management: Create tailored prompts with context-rich information to steer LLM responses.
  • Modular Pipeline: Extend or modify components according to your project requirements.

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • A vector database: Qdrant, Pinecone, or MongoDB
  • An OpenAI API key

Install via PyPI

Install RecallAIsh using the Python Package Index:

pip install RecallAIsh

If you plan to use MongoDB for storing vectors, install the optional dependencies:

pip install RecallAIsh[mongodb]

Manual Installation

  1. Clone the repository:
    git clone https://github.com/AshishChandpa/RecallAIsh.git
    cd RecallAIsh
    
  2. Install the dependencies:
    pip install -r requirements.txt
    
  3. Create a .env file in the project root and add your OpenAI API key:
    OPENAI_API_KEY=your_openai_api_key
    
  4. Ensure your vector database is up and running. For example, start Qdrant:
    docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
    
    Or, set up your MongoDB instance accordingly.

Usage

Running an Example

See RecallAIsh in action with the provided example script:

python examples/example.py

Integrating RecallAIsh into Your Project

Below is a detailed example highlighting primary components, including the new MongoDB vector store and web scraper integration:

import os

from RecallAIsh.document_loaders.web_loader import WebDocumentLoader
from RecallAIsh.prompt_manager import PromptManager
from RecallAIsh.rag_system import RAGSystem
from RecallAIsh.vector_store.mongodb_store import MongoDBVectorStore
from RecallAIsh.vector_store.qdrant_store import QdrantVectorStore

# Example: Connecting using Qdrant
qdrant_store = QdrantVectorStore(
    url="http://localhost:6333",
    collection_name="my_rag_collection",
    vector_size=1536,  # Adjust to match your chosen embedding dimension
)

# Example: Connecting using MongoDB
mongodb_store = MongoDBVectorStore(
    uri="<MongoAtlasURL>",
    database="recallai_db",
    collection="vector_store",
    vector_size=1536,  # Adjust to match your embedding dimension
)

# Initialize the Retrieval-Augmented Generation system with your preferred vector store
rag_system = RAGSystem(
    vector_store=mongodb_store,  # or qdrant_store if preferred
    vector_namespace="default_namespace",
    openai_api_key=os.getenv("OPENAI_API_KEY"),
)

# Retrieve relevant documents based on the user's query
user_query = "Summarize the latest news on technology."
# First, use the web scraper to fetch dynamic content from the web
doc = WebDocumentLoader().load(url="https://news.example.com/technology")
# Store processed web content into the vector store as needed
rag_system.ingestion_pipeline([doc])

# Retrieve documents including the freshly scraped web content
context = rag_system.retrieve_documents(user_query, source="all")

# Define custom instructions and generate the full prompt
instructions = "You are an expert assistant tasked with summarizing complex technical news."
prompt_manager = PromptManager(instructions=instructions)
full_prompt = prompt_manager.create_prompt(context, user_query)

# Generate answer using the RAG system
response = rag_system.chat(full_prompt, model="gpt-4o-mini")
print("Answer:", response)

Configuration

  • Customize vector store parameters such as collection name and embedding dimensions as needed.
  • Extend the ingestion pipeline to incorporate additional document formats, web scraping, or data sources.
  • Adjust the prompt management module to refine how context and instructions are combined for your specific application.

Web Scraper Integration

RecallAIsh now includes a web scraper module which leverages standard libraries like BeautifulSoup and requests. This allows you to dynamically ingest web content:

  • Configure the scraper with custom parameters such as user-agent, timeout, and parsing criteria.
  • Automatically process and clean HTML content before storing it in your chosen vector store.

Contributing

Contributions are welcome! To contribute:

  1. Open an issue for discussion or report a bug.
  2. Submit a pull request with your improvements.
  3. Follow the coding standards and ensure tests pass before submission.

License

RecallAIsh is available under the MIT License. See the LICENSE file for more details.

Contact

For further questions or feedback, please contact via email: chandpa.ashish007@gmail.com.

Happy coding with RecallAIsh!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recallaish-0.2.2.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recallaish-0.2.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file recallaish-0.2.2.tar.gz.

File metadata

  • Download URL: recallaish-0.2.2.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for recallaish-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c98eb2d39b172522ba9731d4fd204042423a50c17d25b978945bfbc16f77913a
MD5 d96b90f9ea43e41237376a7cda31cce8
BLAKE2b-256 dca63528d0dfa35524283ba075efa3e6346ebfe260239466608cbfacdbb0a564

See more details on using hashes here.

File details

Details for the file recallaish-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: recallaish-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for recallaish-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 030397bc9fb8512ef27304d106e6296990301efc7890eaf427216611df82709b
MD5 bc61712e715bcdda551a80c3d9fbe6af
BLAKE2b-256 8796e51a0ade5b24b0e8bcba598207570b1c8381ccc1d92ef4082e4fc175a5a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page