Skip to main content

A persistent implementation of BM25 retrieval.

Project description

bm25_retriever

bm25_retriever is a persistent BM25 retriever for use with LangChain, built on top of rank_bm25.

Features

  • Save and load BM25 retriever to/from disk
  • Easily integrate with LangChain Document format
  • Simple API for creating, persisting, and querying retrievers

Installation

pip install bm25_retriever

Usage

from langchain_core.documents import Document
from bm25_retriever.retriever import PersistentBM25Retriever

# Sample documents
docs = [
    Document(page_content="The quick brown fox jumps over the lazy dog"),
    Document(page_content="A fox fled from danger"),
    Document(page_content="Dogs are loyal companions"),
    Document(page_content="Foxes are cunning and agile"),
]

# Step 1: Create a retriever with a specified save directory
save_directory = "bm25_storage"
retriever = PersistentBM25Retriever(documents=docs, save_dir=save_directory, persist=True)
# Step 2: Persist the retriever to the specified directory
# retriever.persist()

# Step 3: Load the retriever from the directory with a custom k value
loaded_retriever = PersistentBM25Retriever.from_persist_dir(save_dir=save_directory, k=3)

# Step 4: Use the loaded retriever to retrieve documents
query = "fox"
results = loaded_retriever.get_relevant_documents(query)

# Print the retrieved documents
print(f"Retrieved {len(results)} documents for query '{query}':")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bm25_retriever-0.3.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bm25_retriever-0.3-py3-none-any.whl (3.0 kB view details)

Uploaded Python 3

File details

Details for the file bm25_retriever-0.3.tar.gz.

File metadata

  • Download URL: bm25_retriever-0.3.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for bm25_retriever-0.3.tar.gz
Algorithm Hash digest
SHA256 3d3a45f74b3a79e287bb7ebf6d4ed22626ad79f432c2478656d5e42ae32ab8ea
MD5 9c2193d3343a2f9a5022835ad62b78dc
BLAKE2b-256 50ddf6cd246dd44931b2dc8787a7ec45a32c6b10bf795a41095d69c20796edb2

See more details on using hashes here.

File details

Details for the file bm25_retriever-0.3-py3-none-any.whl.

File metadata

  • Download URL: bm25_retriever-0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for bm25_retriever-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a057534856fa4e856d46c9ffc8997e49732d7d46021f5bf3b034b5cbe26dc2e1
MD5 e6793b1f36bc575b259bf69df0c1fcb9
BLAKE2b-256 39e81c8d48015b7e757c618cdfe6f3739b82f1b26e70d45fd701564a0262c7c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page