A simple high-level API and CLI for BM25.
Project description
BM25
The easiest way to add powerful search to your Python projects or command line.
| 💻 GitHub | 📦 PyPI | 🏠 Homepage |
BM25 is a famous algorithm used by search engines (like Elasticsearch) to find the most relevant documents for a given search query. It works by matching keywords and scoring documents based on how often those words appear.
This package provides a dead-simple, beginner-friendly way to use BM25 in Python. Under the hood, it is powered by bm25s, an ultra-fast, highly optimized library. By installing BM25, you get all the performance benefits of bm25s (including speedups and stemming) with a streamlined, 1-line API and a beautiful command-line interface.
🛠️ Installation
Get started in seconds with pip:
pip install BM25
This automatically installs the optimized bm25s backend, along with necessary dependencies for better search quality (PyStemmer) and a colorful terminal experience (rich).
🐍 Python API: 1-Line Search
If you want to quickly build a search engine over a local file or a list of texts, the BM25 module makes it incredibly easy.
import BM25
# 1. Load your documents (supports .csv, .json, .jsonl, .txt)
# For csv/jsonl, you can specify which column/key holds the text
corpus = BM25.load("documents.csv", document_column="text")
# 2. Build the search index
retriever = BM25.index(corpus)
# 3. Search!
queries = ["how to learn python", "best search algorithms"]
results = retriever.search(queries, k=5) # Get top 5 results
# Print the top results for the first query
for result in results[0]:
print(f"Score: {result['score']:.2f} | Document: {result['document']}")
The load function handles reading your files, while index automatically takes care of text processing (tokenization, stemming) and creating the searchable index.
💻 Command-Line Interface (CLI)
Don't want to write code? The BM25 package comes with a built-in terminal app for instant indexing and searching.
Step 1: Index your documents
Turn any text, CSV, or JSON file into a search index.
# Index a simple text file (one document per line)
bm25 index documents.txt -o my_index
# Index a CSV file using a specific column for the text
bm25 index documents.csv -o my_index -c text
Step 2: Search
Query your newly created index directly from the terminal.
# Basic search (returns top 10 results)
bm25 search -i my_index "what is machine learning?"
# Return more results and save them to a file
bm25 search -i my_index "your query here" -k 20 -s results.json
🌟 Pro-tip: The User Directory
You can save indices to a central user directory (~/.bm25s/indices/) so you can search them from anywhere on your computer without remembering file paths.
# Save to the central directory using the -u flag
bm25 index documents.csv -u -o my_docs
# Search interactively! Just type this, and a menu will let you pick your index:
bm25 search -u "what is AI?"
🚀 Going Further
The BM25 package is designed to be simple and get out of your way. But if you find yourself needing more advanced features—like saving/loading models, integrating with Hugging Face, tweaking the math behind the algorithm, or handling massive millions-of-documents datasets—you already have the tools!
You can drop down to the underlying bm25s library anytime. Check out the bm25s documentation for full details on advanced usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bm25-0.3.3.tar.gz.
File metadata
- Download URL: bm25-0.3.3.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26bed357c4935e124606a78179d32d1c2c1d4260ada02f3be438232d730cc4d4
|
|
| MD5 |
c76941196bff5443b24d391310efc820
|
|
| BLAKE2b-256 |
6dd25166359c0eca77703c6e0f30c754256eac6eb7cc31749f9a80fd7bc58eb9
|
Provenance
The following attestation bundles were made for bm25-0.3.3.tar.gz:
Publisher:
publish-python.yaml on xhluca/bm25s
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bm25-0.3.3.tar.gz -
Subject digest:
26bed357c4935e124606a78179d32d1c2c1d4260ada02f3be438232d730cc4d4 - Sigstore transparency entry: 1187647628
- Sigstore integration time:
-
Permalink:
xhluca/bm25s@bd21d227daf3f1fb9f82d1f9ccc5ff5ae009c04b -
Branch / Tag:
refs/tags/0.3.3 - Owner: https://github.com/xhluca
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yaml@bd21d227daf3f1fb9f82d1f9ccc5ff5ae009c04b -
Trigger Event:
release
-
Statement type:
File details
Details for the file bm25-0.3.3-py3-none-any.whl.
File metadata
- Download URL: bm25-0.3.3-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25936964eb3b4cf5f6cd0bafc779dfe8dac3cd88b4be9c3922646cd4214ff1a7
|
|
| MD5 |
b6cfa95f05583f7b051c799e0eb6b63f
|
|
| BLAKE2b-256 |
4ba31eb7084adad09f586342155a0c8598b6042f822f0deed622dee7b23a37d5
|
Provenance
The following attestation bundles were made for bm25-0.3.3-py3-none-any.whl:
Publisher:
publish-python.yaml on xhluca/bm25s
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bm25-0.3.3-py3-none-any.whl -
Subject digest:
25936964eb3b4cf5f6cd0bafc779dfe8dac3cd88b4be9c3922646cd4214ff1a7 - Sigstore transparency entry: 1187647632
- Sigstore integration time:
-
Permalink:
xhluca/bm25s@bd21d227daf3f1fb9f82d1f9ccc5ff5ae009c04b -
Branch / Tag:
refs/tags/0.3.3 - Owner: https://github.com/xhluca
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python.yaml@bd21d227daf3f1fb9f82d1f9ccc5ff5ae009c04b -
Trigger Event:
release
-
Statement type: