Yet Another Best Matching 25 (YABM25) algorithm
Project description
YaBM25
Yet Another BM25 Implementation - A fast, scalable BM25 search engine with both in-memory and disk-based indexing.
Features
- 🚀 Compatible with rank_bm25 API
- 📦 Optional disk-based indexing for large datasets
- 🔧 Multiple BM25 variants (BM25L, BM25Adpt)
- 🛠 Production-ready with proper resource management
Installation
pip install yabm25
Quick Start
Simple In-Memory Usage
from yabm25 import BM25Indexer
# Initialize with corpus
corpus = [
"Hello there good man!",
"It is quite windy in London",
"How is the weather today?"
]
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Indexer(tokenized_corpus)
# Search
query = "windy London"
doc_scores = bm25.get_scores(query.split(" "))
print(doc_scores) # array([0., 0.93729472, 0.])
# Get top document
top_docs = bm25.get_top_n(query.split(" "), corpus, n=1)
print(top_docs) # ['It is quite windy in London']
Large-Scale Usage
from yabm25 import BM25Indexer, BM25Config
# Configure disk-based index
config = BM25Config(
index_dir="my_index",
doc_chunk_size=500_000,
compression="ZSTD"
)
# Build index
indexer = BM25Indexer(config)
indexer.build_index(large_corpus)
# Search
results = indexer.query(["term1", "term2"])
Documentation
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
yabm25-0.1.0.tar.gz
(7.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yabm25-0.1.0.tar.gz.
File metadata
- Download URL: yabm25-0.1.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f16d0cebbd2d4eb38421fb2ea2e1c926a83be075393841ff2aaad250615499c2
|
|
| MD5 |
77c7e9719202f2259ca7cb3e3c56011b
|
|
| BLAKE2b-256 |
4d6109fa22bb35b0d0db197118838538a4e1503a5bf8a1284eb17144e17a10d5
|
File details
Details for the file yabm25-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yabm25-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17a2055bf30addef102e2264de5ba3758d2b9e04e8179a45bbd7a6668bd136cc
|
|
| MD5 |
3e56f85b8f6b9a8be89bea9f84f821c7
|
|
| BLAKE2b-256 |
5ddd6ce7852c2603b6e2f540f23831e2d5130b53adcfcf7441cd59e360d5c207
|