Skip to main content

BM25 NLP model

Project description

BagModels

PyPI version Python tests

BagModels is a repository of various bag of words (BoW) algorithms in machine learning. Currently it includes OkapiBM25. More coming soon.

BM25 is a text retrieval function that can find similar documents or rank search in a set of documents based on the query terms appearing in each document irrespective of their proximity to each other. It is an improved and more generalised version of TF-IDF algorithm in NLP.

Installation

It can be installed using pip:

pip install bagmodels

Getting started

Basic usage

import re
from bagmodels import BM25

# Load corpus
corpus = list({
    "Yo, I love NLP model",
    "I like algorithms",
    "I love ML!"
})

# Clean manually if needed or pass custom tokenizer to BM25
corpus = [re.sub(r",|!", " ", doc).strip() for doc in corpus]

# Initialize model
model = BM25(corpus=corpus)

# Similarity
model.similarity("I love NLP model", "I like NLP model") # 0.775
model.similarity("I love blah", "I love algorithms") # 0.446

Save and reuse models

# libaries imported and corpus already loaded before it
model = BM25(corpus=corpus)

# write to save path
model.save("output/bm25_v1.jbl")

# load again
model = BM25.load("output/bm25_v1.jbl")

# add documents if required
model.resume(corpus=additonal_corpus)

# predict / search / find / retrieve like
model.similarity(doc_a, doc_b)

Coming soon

Please feel free to open an issue to request a feature or discuss any changes. Pull requests are most welcome.

I am trying to actively add the following:

  • OkapiBM25
  • BM25 variations
  • MultiThreading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bagmodels-0.1.5.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

bagmodels-0.1.5-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file bagmodels-0.1.5.tar.gz.

File metadata

  • Download URL: bagmodels-0.1.5.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for bagmodels-0.1.5.tar.gz
Algorithm Hash digest
SHA256 4e62e85b3fd2709f522be2274156ea174ec0ef43fbe232a7f5673e2c87b3b7bf
MD5 79f09cfc19215a1ff6fb2de6243932db
BLAKE2b-256 85ab4375930b04565164e60aceae51146e9628f1a68916d2c01cb39d67cf71d4

See more details on using hashes here.

File details

Details for the file bagmodels-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: bagmodels-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.13

File hashes

Hashes for bagmodels-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 19c74c5738736e5c2f503bc59357d98a89d0d4f7ba8f360972c2a634ebf9873d
MD5 c8ee4c2c9eab718c26f638e31f465c22
BLAKE2b-256 a99232d4b985d1477c5b48195b9af8890641c88e95cdd0ae7d659947d2aad253

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page