RecallAI is a cutting-edge Retrieval-Augmented Generation (RAG) framework designed for Large Language Models (LLMs). It enhances LLM responses by integrating real-time knowledge retrieval from structured and unstructured data sources.
Project description
RecallAIsh: A Retrieval-Augmented Generation (RAG) Framework
RecallAIsh is a comprehensive Python package designed to easily add Retrieval-Augmented Generation capabilities to your applications. It seamlessly integrates real-time knowledge retrieval with Large Language Model (LLM) responses to deliver context-aware, accurate, and dynamic results.
Table of Contents
- Overview
- Features
- Installation
- Usage
- Configuration
- Web Scraper Integration
- Contributing
- License
- Contact
Overview
RecallAIsh leverages the power of state-of-the-art retrieval methods combined with LLMs, allowing you to enrich your text generation workflows with up-to-date and contextually relevant information. With built-in support for various document sources, dynamic web content scraping, and flexible vector storage solutions such as Qdrant, Pinecone, and MongoDB, this package is ideal for projects ranging from smart document QA systems to advanced conversational agents.
Features
- Retrieval-Augmented Generation (RAG): Combine real-time data retrieval with LLM responses for informed outputs.
- Plug-and-Play Integration: Easily integrate with GPT-based models and other LLMs for powerful natural language understanding.
- Vector Storage Solutions: Built-in support for Qdrant, Pinecone, and MongoDB for efficient document embedding storage and retrieval.
- Multi-Source Ingestion: Ingest content from PDFs, web pages via integrated web scrapers, and additional document sources.
- Custom Prompt Management: Create tailored prompts with context-rich information to steer LLM responses.
- Modular Pipeline: Extend or modify components according to your project requirements.
Installation
Prerequisites
- Python 3.10 or higher
- pip package manager
- A vector database: Qdrant, Pinecone, or MongoDB
- An OpenAI API key
Install via PyPI
Install RecallAIsh using the Python Package Index:
pip install RecallAIsh
If you plan to use MongoDB for storing vectors, install the optional dependencies:
pip install RecallAIsh[mongodb]
Manual Installation
- Clone the repository:
git clone https://github.com/AshishChandpa/RecallAIsh.git cd RecallAIsh
- Install the dependencies:
pip install -r requirements.txt
- Create a
.envfile in the project root and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key
- Ensure your vector database is up and running. For example, start Qdrant:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
Or, set up your MongoDB instance accordingly.
Usage
Running an Example
See RecallAIsh in action with the provided example script:
python examples/example.py
Integrating RecallAIsh into Your Project
Below is a detailed example highlighting primary components, including the new MongoDB vector store and web scraper integration:
import os
from RecallAIsh.document_loaders.web_loader import WebDocumentLoader
from RecallAIsh.prompt_manager import PromptManager
from RecallAIsh.rag_system import RAGSystem
from RecallAIsh.vector_store.mongodb_store import MongoDBVectorStore
from RecallAIsh.vector_store.qdrant_store import QdrantVectorStore
# Example: Connecting using Qdrant
qdrant_store = QdrantVectorStore(
url="http://localhost:6333",
collection_name="my_rag_collection",
vector_size=1536, # Adjust to match your chosen embedding dimension
)
# Example: Connecting using MongoDB
mongodb_store = MongoDBVectorStore(
uri="<MongoAtlasURL>",
database="recallai_db",
collection="vector_store",
vector_size=1536, # Adjust to match your embedding dimension
)
# Initialize the Retrieval-Augmented Generation system with your preferred vector store
rag_system = RAGSystem(
vector_store=mongodb_store, # or qdrant_store if preferred
vector_namespace="default_namespace",
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
# Retrieve relevant documents based on the user's query
user_query = "Summarize the latest news on technology."
# First, use the web scraper to fetch dynamic content from the web
doc = WebDocumentLoader().load(url="https://news.example.com/technology")
# Store processed web content into the vector store as needed
rag_system.ingestion_pipeline([doc])
# Retrieve documents including the freshly scraped web content
context = rag_system.retrieve_documents(user_query, source="all")
# Define custom instructions and generate the full prompt
instructions = "You are an expert assistant tasked with summarizing complex technical news."
prompt_manager = PromptManager(instructions=instructions)
full_prompt = prompt_manager.create_prompt(context, user_query)
# Generate answer using the RAG system
response = rag_system.chat(full_prompt, model="gpt-4o-mini")
print("Answer:", response)
Configuration
- Customize vector store parameters such as collection name and embedding dimensions as needed.
- Extend the ingestion pipeline to incorporate additional document formats, web scraping, or data sources.
- Adjust the prompt management module to refine how context and instructions are combined for your specific application.
Web Scraper Integration
RecallAIsh now includes a web scraper module which leverages standard libraries like BeautifulSoup and requests. This allows you to dynamically ingest web content:
- Configure the scraper with custom parameters such as user-agent, timeout, and parsing criteria.
- Automatically process and clean HTML content before storing it in your chosen vector store.
Contributing
Contributions are welcome! To contribute:
- Open an issue for discussion or report a bug.
- Submit a pull request with your improvements.
- Follow the coding standards and ensure tests pass before submission.
License
RecallAIsh is available under the MIT License. See the LICENSE file for more details.
Contact
For further questions or feedback, please contact via email: chandpa.ashish007@gmail.com.
Happy coding with RecallAIsh!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recallaish-0.2.3.tar.gz.
File metadata
- Download URL: recallaish-0.2.3.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc3689ac48a86890cf78c200fd974411b0d08bc9f522675d209bc3f03411a301
|
|
| MD5 |
68460c36f8998d236a25c00a6d2f30f3
|
|
| BLAKE2b-256 |
8604c5dd39e16e7b02425fc0e22cc60c82a15799214774081c19826d0c9c79e1
|
File details
Details for the file recallaish-0.2.3-py3-none-any.whl.
File metadata
- Download URL: recallaish-0.2.3-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bb9a3b48b055d9d6d6ab315b0488fcfc524f9c521d5c43b8051e9d1380f5dfb
|
|
| MD5 |
5ef53ad3416ec36bb80adda590fbb076
|
|
| BLAKE2b-256 |
4ed5d2a39229b607bd2c8818cca01a95a1a668e896c3984bae1dca8ed1951b6a
|