Skip to main content

An open-source Python RAG library for Contextual Retrieval

Project description

Contextual Retrieval

An open-source Python library for Contextual Retrieval, designed to significantly improve the retrieval step in Retrieval-Augmented Generation (RAG) systems.

Features

  • Easy to Use: Get started with just a few lines of code.
  • Modular Design: Choose between different retrieval modes:
    • Contextual Embedding
    • Contextual Embedding + Contextual BM25
    • Reranked Contextual Embedding + Contextual BM25
  • Model Agnostic: Use your preferred models for context generation, embeddings, and reranking.
  • Customizable: Override prompts and configurations to suit your use case.
  • Beginner Friendly: Sensible defaults make it easy for beginners.
  • Efficient: Utilizes FAISS for fast similarity search.
  • Flexible: Supports both CPU and GPU acceleration.

Installation

Install the library using pip:

pip install contextual-retrieval

Quick Start

Here's a simple example to get you started with the full power of Contextual Retrieval:

from contextual_retrieval import ContextualRetrieval

# Initialize the retriever with the full mode
retriever = ContextualRetrieval(mode='rerank')

# Index some documents
documents = [
    "Artificial Intelligence is transforming various industries.",
    "Machine Learning is a subset of AI focused on data-driven algorithms.",
    "Natural Language Processing enables computers to understand human language.",
    "Deep Learning models, like neural networks, are inspired by the human brain.",
    "Computer Vision allows machines to interpret and make decisions based on visual data."
]
retriever.index_documents(documents)

# Query the system
query = "What are the main areas of AI?"
results = retriever.query(query, top_k=3)

print(f"Query: {query}\n")
print("Top Results:")
for i, (doc, score) in enumerate(results, 1):
    print(f"{i}. (Score: {score:.4f}) {doc}")

This example demonstrates how to use the full Reranked Contextual Embedding + Contextual BM25 mode with just one line of initialization. The system will automatically generate context for chunks, use both embedding and BM25 for retrieval, and apply reranking to provide the most relevant results.

Learn More

For more information about the Contextual Retrieval technique and its benefits, check out the original article by Anthropic: Introducing Contextual Retrieval

Advanced Usage

For more advanced usage, including custom models and configurations, check out the advanced example.

Components

  • EmbeddingModel: Handles text embedding using various models.
  • ContextGenerator: Generates context for chunks using language models.
  • BM25Retriever: Implements BM25 retrieval functionality.
  • Reranker: Reranks retrieved chunks using transformer-based models.
  • VectorStore: Manages the vector database for embeddings.

Contributing

We welcome contributions!

License

This project is licensed under the MIT License.

Citation

If you use this library in your research, please cite:

@software{contextual_retrieval,
  title = {Contextual Retrieval: An Open-Source Library for Improved RAG Systems},
  author = {Hamada Fadil Mahdi},
  year = {2024},
  url = {https://github.com/HamadaFMahdi/contextual-retrieval}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextual_retrieval-0.1.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

contextual_retrieval-0.1.1-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file contextual_retrieval-0.1.1.tar.gz.

File metadata

  • Download URL: contextual_retrieval-0.1.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for contextual_retrieval-0.1.1.tar.gz
Algorithm Hash digest
SHA256 98594e45ec0019a14b37e485ace62f3eaeccc0cd7ed766c6616fbfb84b40efa5
MD5 6a1dcb0290610993dd9f4c297bf00161
BLAKE2b-256 6385514ac12326e2e7aa1c8135a1f0c1316a2b96ce12fc32e79d2e9170a5ca5a

See more details on using hashes here.

File details

Details for the file contextual_retrieval-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for contextual_retrieval-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 000ab5768e24b170fc84775fc3c63ccfc02d231e5a89a2facb3f48842a0ad37c
MD5 1e9212b6896b2a89368caf8781a6e6a9
BLAKE2b-256 a4424bc480961d7f8307c6cc089a4c4a6d2c600cc7a6980eafecddd74f249c94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page