An open-source Python RAG library for Contextual Retrieval
Project description
Contextual Retrieval
An open-source Python library for Contextual Retrieval, designed to significantly improve the retrieval step in Retrieval-Augmented Generation (RAG) systems.
Features
- Easy to Use: Get started with just a few lines of code.
- Modular Design: Choose between different retrieval modes:
- Contextual Embedding
- Contextual Embedding + Contextual BM25
- Reranked Contextual Embedding + Contextual BM25
- Model Agnostic: Use your preferred models for context generation, embeddings, and reranking.
- Customizable: Override prompts and configurations to suit your use case.
- Beginner Friendly: Sensible defaults make it easy for beginners.
- Efficient: Utilizes FAISS for fast similarity search.
- Flexible: Supports both CPU and GPU acceleration.
Installation
Install the library using pip:
pip install contextual-retrieval
Quick Start
Here's a simple example to get you started with the full power of Contextual Retrieval:
from contextual_retrieval import ContextualRetrieval
# Initialize the retriever with the full mode
retriever = ContextualRetrieval(mode='rerank')
# Index some documents
documents = [
"Artificial Intelligence is transforming various industries.",
"Machine Learning is a subset of AI focused on data-driven algorithms.",
"Natural Language Processing enables computers to understand human language.",
"Deep Learning models, like neural networks, are inspired by the human brain.",
"Computer Vision allows machines to interpret and make decisions based on visual data."
]
retriever.index_documents(documents)
# Query the system
query = "What are the main areas of AI?"
results = retriever.query(query, top_k=3)
print(f"Query: {query}\n")
print("Top Results:")
for i, (doc, score) in enumerate(results, 1):
print(f"{i}. (Score: {score:.4f}) {doc}")
This example demonstrates how to use the full Reranked Contextual Embedding + Contextual BM25 mode with just one line of initialization. The system will automatically generate context for chunks, use both embedding and BM25 for retrieval, and apply reranking to provide the most relevant results.
Learn More
For more information about the Contextual Retrieval technique and its benefits, check out the original article by Anthropic: Introducing Contextual Retrieval
Advanced Usage
For more advanced usage, including custom models and configurations, check out the advanced example.
Components
- EmbeddingModel: Handles text embedding using various models.
- ContextGenerator: Generates context for chunks using language models.
- BM25Retriever: Implements BM25 retrieval functionality.
- Reranker: Reranks retrieved chunks using transformer-based models.
- VectorStore: Manages the vector database for embeddings.
Contributing
We welcome contributions!
License
This project is licensed under the MIT License.
Citation
If you use this library in your research, please cite:
@software{contextual_retrieval,
title = {Contextual Retrieval: An Open-Source Library for Improved RAG Systems},
author = {Hamada Fadil Mahdi},
year = {2024},
url = {https://github.com/HamadaFMahdi/contextual-retrieval}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file contextual_retrieval-0.1.1.tar.gz
.
File metadata
- Download URL: contextual_retrieval-0.1.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98594e45ec0019a14b37e485ace62f3eaeccc0cd7ed766c6616fbfb84b40efa5 |
|
MD5 | 6a1dcb0290610993dd9f4c297bf00161 |
|
BLAKE2b-256 | 6385514ac12326e2e7aa1c8135a1f0c1316a2b96ce12fc32e79d2e9170a5ca5a |
File details
Details for the file contextual_retrieval-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: contextual_retrieval-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 000ab5768e24b170fc84775fc3c63ccfc02d231e5a89a2facb3f48842a0ad37c |
|
MD5 | 1e9212b6896b2a89368caf8781a6e6a9 |
|
BLAKE2b-256 | a4424bc480961d7f8307c6cc089a4c4a6d2c600cc7a6980eafecddd74f249c94 |