Skip to main content

Yet Another Best Matching 25 (YABM25) algorithm

Project description

YaBM25

Yet Another BM25 Implementation - A fast, scalable BM25 search engine with both in-memory and disk-based indexing.

Features

  • 🚀 Compatible with rank_bm25 API
  • 📦 Optional disk-based indexing for large datasets
  • 🔧 Multiple BM25 variants (BM25L, BM25Adpt)
  • 🛠 Production-ready with proper resource management

Installation

pip install yabm25

Quick Start

Simple In-Memory Usage

from yabm25 import BM25Indexer

# Initialize with corpus
corpus = [
    "Hello there good man!",
    "It is quite windy in London",
    "How is the weather today?"
]
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Indexer(tokenized_corpus)

# Search
query = "windy London"
doc_scores = bm25.get_scores(query.split(" "))
print(doc_scores)  # array([0., 0.93729472, 0.])

# Get top document
top_docs = bm25.get_top_n(query.split(" "), corpus, n=1)
print(top_docs)  # ['It is quite windy in London']

Large-Scale Usage

from yabm25 import BM25Indexer, BM25Config

# Configure disk-based index
config = BM25Config(
    index_dir="my_index",
    doc_chunk_size=500_000,
    compression="ZSTD"
)

# Build index
indexer = BM25Indexer(config)
indexer.build_index(large_corpus)

# Search
results = indexer.query(["term1", "term2"])

Documentation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yabm25-0.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yabm25-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file yabm25-0.1.0.tar.gz.

File metadata

  • Download URL: yabm25-0.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for yabm25-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f16d0cebbd2d4eb38421fb2ea2e1c926a83be075393841ff2aaad250615499c2
MD5 77c7e9719202f2259ca7cb3e3c56011b
BLAKE2b-256 4d6109fa22bb35b0d0db197118838538a4e1503a5bf8a1284eb17144e17a10d5

See more details on using hashes here.

File details

Details for the file yabm25-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: yabm25-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for yabm25-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17a2055bf30addef102e2264de5ba3758d2b9e04e8179a45bbd7a6668bd136cc
MD5 3e56f85b8f6b9a8be89bea9f84f821c7
BLAKE2b-256 5ddd6ce7852c2603b6e2f540f23831e2d5130b53adcfcf7441cd59e360d5c207

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page