BM25 NLP model
Project description
BagModels
BagModels is a repository of various bag of words (BoW) algorithms in machine learning. Currently it includes OkapiBM25. More coming soon.
BM25 is a text retrieval function that can find similar documents or rank search in a set of documents based on the query terms appearing in each document irrespective of their proximity to each other. It is an improved and more generalised version of TF-IDF algorithm in NLP.
Installation
It can be installed using pip:
pip install bagmodels
Getting started
Basic usage
import re
from bagmodels import BM25
# Load corpus
corpus = list({
"Yo, I love NLP model",
"I like algorithms",
"I love ML!"
})
# Clean manually if needed or pass custom tokenizer to BM25
corpus = [re.sub(r",|!", " ", doc).strip() for doc in corpus]
# Initialize model
model = BM25(corpus=corpus)
# Similarity
model.similarity("I love NLP model", "I like NLP model") # 0.775
model.similarity("I love blah", "I love algorithms") # 0.446
Save and reuse models
# libaries imported and corpus already loaded before it
model = BM25(corpus=corpus)
# write to save path
model.save("output/bm25_v1.jbl")
# load again
model = BM25.load("output/bm25_v1.jbl")
# add documents if required
model.resume(corpus=additonal_corpus)
# predict / search / find / retrieve like
model.similarity(doc_a, doc_b)
Coming soon
Please feel free to open an issue to request a feature or discuss any changes. Pull requests are most welcome.
I am trying to actively add the following:
- OkapiBM25
- BM25 variations
- MultiThreading
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bagmodels-0.1.5.tar.gz
.
File metadata
- Download URL: bagmodels-0.1.5.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e62e85b3fd2709f522be2274156ea174ec0ef43fbe232a7f5673e2c87b3b7bf |
|
MD5 | 79f09cfc19215a1ff6fb2de6243932db |
|
BLAKE2b-256 | 85ab4375930b04565164e60aceae51146e9628f1a68916d2c01cb39d67cf71d4 |
File details
Details for the file bagmodels-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: bagmodels-0.1.5-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19c74c5738736e5c2f503bc59357d98a89d0d4f7ba8f360972c2a634ebf9873d |
|
MD5 | c8ee4c2c9eab718c26f638e31f465c22 |
|
BLAKE2b-256 | a99232d4b985d1477c5b48195b9af8890641c88e95cdd0ae7d659947d2aad253 |