Skip to main content

An open-source Python RAG library for Contextual Retrieval

Project description

Contextual Retrieval

An open-source Python library for Contextual Retrieval, designed to significantly improve the retrieval step in Retrieval-Augmented Generation (RAG) systems.

Features

  • Easy to Use: Get started with just a few lines of code.
  • Modular Design: Choose between different retrieval modes:
    • Contextual Embedding
    • Contextual Embedding + Contextual BM25
    • Reranked Contextual Embedding + Contextual BM25
  • Model Agnostic: Use your preferred models for context generation, embeddings, and reranking.
  • Customizable: Override prompts and configurations to suit your use case.
  • Beginner Friendly: Sensible defaults make it easy for beginners.
  • Efficient: Utilizes FAISS for fast similarity search.
  • Flexible: Supports both CPU and GPU acceleration.

Installation

Install the library using pip:

pip install contextual-retrieval

Quick Start

Here's a simple example to get you started with the full power of Contextual Retrieval:

from contextual_retrieval import ContextualRetrieval

# Initialize the retriever with the full mode
retriever = ContextualRetrieval(mode='rerank')

# Index some documents
documents = [
    "Artificial Intelligence is transforming various industries.",
    "Machine Learning is a subset of AI focused on data-driven algorithms.",
    "Natural Language Processing enables computers to understand human language.",
    "Deep Learning models, like neural networks, are inspired by the human brain.",
    "Computer Vision allows machines to interpret and make decisions based on visual data."
]
retriever.index_documents(documents)

# Query the system
query = "What are the main areas of AI?"
results = retriever.query(query, top_k=3)

print(f"Query: {query}\n")
print("Top Results:")
for i, (doc, score) in enumerate(results, 1):
    print(f"{i}. (Score: {score:.4f}) {doc}")

This example demonstrates how to use the full Reranked Contextual Embedding + Contextual BM25 mode with just one line of initialization. The system will automatically generate context for chunks, use both embedding and BM25 for retrieval, and apply reranking to provide the most relevant results.

Learn More

For more information about the Contextual Retrieval technique and its benefits, check out the original article by Anthropic: Introducing Contextual Retrieval

Advanced Usage

For more advanced usage, including custom models and configurations, check out the advanced example.

Components

  • EmbeddingModel: Handles text embedding using various models.
  • ContextGenerator: Generates context for chunks using language models.
  • BM25Retriever: Implements BM25 retrieval functionality.
  • Reranker: Reranks retrieved chunks using transformer-based models.
  • VectorStore: Manages the vector database for embeddings.

Contributing

We welcome contributions!

License

This project is licensed under the MIT License.

Citation

If you use this library in your research, please cite:

@software{contextual_retrieval,
  title = {Contextual Retrieval: An Open-Source Library for Improved RAG Systems},
  author = {Hamada Fadil Mahdi},
  year = {2024},
  url = {https://github.com/HamadaFMahdi/contextual-retrieval}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextual_retrieval-0.1.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

contextual_retrieval-0.1.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file contextual_retrieval-0.1.0.tar.gz.

File metadata

  • Download URL: contextual_retrieval-0.1.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for contextual_retrieval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 14498bd1ebf5f850219c27e303c936b09cb3650892cb931e2782a4b2f4491e34
MD5 23c062800f893846f113496570defd85
BLAKE2b-256 72d549cff27f3b29d8eb6103524eeedbb16fc69bbfab481541df99d3d45e4f7c

See more details on using hashes here.

File details

Details for the file contextual_retrieval-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for contextual_retrieval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63a751b6968ee13b5467e3b1ee57034c8c6275477cc0bc5027d620a714cc85b7
MD5 7865f215b78da16afd568483e9b617e7
BLAKE2b-256 d63e0216e1930be799fa0dfcb3a88bf0f9f42835df0e452ac4459b329adc3784

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page