Skip to main content

Python library for the implementations of general and weighted naive Bayes (WNB) classifiers.

Project description

wnb logo
General and weighted naive Bayes classifiers
Scikit-learn-compatible

Lastest Release PyPI Version Python Versions
GitHub Workflow Status (build) PyPI License PyPi Downloads

Introduction

Naive Bayes is often recognized as one of the most popular classification algorithms in the machine learning community. This package takes naive Bayes to a higher level by providing its implementations in more general and weighted settings.

General naive Bayes

The issue with the well-known implementations of the naive Bayes algorithm (such as the ones in sklearn.naive_bayes module) is that they assume a single distribution for the likelihoods of all features. Such an implementation can limit those who need to develop naive Bayes models with different distributions for feature likelihood. And enters WNB library! It allows you to customize your naive Bayes model by specifying the likelihood distribution of each feature separately. You can choose from a range of continuous and discrete probability distributions to design your classifier.

Weighted naive Bayes

Although naive Bayes has many advantages such as simplicity and interpretability, its conditional independence assumption rarely holds true in real-world applications. In order to alleviate its conditional independence assumption, many attribute weighting naive Bayes (WNB) approaches have been proposed. Most of the proposed methods involve computationally demanding optimization problems that do not allow for controlling the model's bias due to class imbalance. Minimum Log-likelihood Difference WNB (MLD-WNB) is a novel weighting approach that optimizes the weights according to the Bayes optimal decision rule and includes hyperparameters for controlling the model's bias. WNB library provides an efficient implementation of gaussian MLD-WNB.

Installation

This library is shipped as an all-in-one module implementation with minimalistic dependencies and requirements. Furthermore, it fully adheres to Scikit-learn API ❤️.

Prerequisites

Ensure that Python 3.8 or higher is installed on your machine before installing WNB.

PyPi

pip install wnb

Poetry

poetry add wnb

GitHub

# Clone the repository
git clone https://github.com/msamsami/wnb.git

# Navigate into the project directory
cd wnb

# Install the package
pip install .

Getting started ⚡️

Here, we show how you can use the library to train general and weighted naive Bayes classifiers.

General naive Bayes

A general naive Bayes model can be set up and used in four simple steps:

  1. Import the GeneralNB class as well as Distribution enum class
from wnb import GeneralNB, Distribution as D
  1. Initialize a classifier and specify the likelihood distributions
gnb = GeneralNB(distributions=[D.NORMAL, D.CATEGORICAL, D.EXPONENTIAL])
  1. Fit the classifier to a training set (with three features)
gnb.fit(X, y)
  1. Predict on test data
gnb.predict(X_test)

Weighted naive Bayes

An MLD-WNB model can be set up and used in four simple steps:

  1. Import the GaussianWNB class
from wnb import GaussianWNB
  1. Initialize a classifier
wnb = GaussianWNB(max_iter=25, step_size=1e-2, penalty="l2")
  1. Fit the classifier to a training set
wnb.fit(X, y)
  1. Predict on test data
wnb.predict(x_test)

Compatibility with Scikit-learn 🤝

The wnb library fully adheres to the Scikit-learn API, ensuring seamless integration with other Scikit-learn components and workflows. This means that users familiar with Scikit-learn will find the WNB classifiers intuitive to use.

Both Scikit-learn classifiers and WNB classifiers share these well-known methods:

  • fit(X, y)
  • predict(X)
  • predict_proba(X)
  • predict_log_proba(X)
  • score(X, y)
  • get_params()
  • set_params(**params)
  • etc.

By maintaining this consistency, WNB classifiers can be easily incorporated into existing machine learning pipelines and processes.

Benchmarks 📊

We conducted benchmarks on three datasets, Breast Cancer, Digits, and Wine, to evaluate the performance of WNB classifiers and compare them with their Scikit-learn counterpart, GaussianNB. The results show that WNB classifiers generally perform better in certain cases.

Dataset Scikit-learn Classifier Accuracy WNB Classifier Accuracy
Breast Cancer GaussianNB 0.939 GaussianWNB 0.951
Digits GaussianNB 0.838 GeneralNB 0.889
Wine GaussianNB 0.974 GeneralNB 0.981

These benchmarks highlight the potential of WNB classifiers to provide better performance in certain scenarios by allowing more flexibility in the choice of distributions and incorporating weighting strategies.

The benchmark scripts used to obtain these results can be found under tests/benchmarks/ directory.

Tests

To run the tests, make sure to clone the repository and install the development requirements in addition to base requirements:

pip install -r requirements.txt
pip install -r requirements_dev.txt

Then, run pytest:

pytest

Support us 💡

You can support the project in the following ways:

⭐ Star WNB on GitHub (click the star button in the top right corner)

💡 Provide your feedback or propose ideas in the Issues section

📰 Post about WNB on LinkedIn or other platforms

Citation 📚

If you utilize this repository, please consider citing it with:

@misc{wnb,
  author = {Mohammd Mehdi Samsami},
  title = {WNB: General and weighted naive Bayes classifiers},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/msamsami/wnb}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wnb-0.3.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

wnb-0.3.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file wnb-0.3.0.tar.gz.

File metadata

  • Download URL: wnb-0.3.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wnb-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f33f9b4a6f71d3c5a04e619d0869a43ac239a4699039a949650101274b22e546
MD5 dcbfbe03d40aea46399fdf236a42108d
BLAKE2b-256 85a1e60ba968bf6c70a20b7865f764aa494f3c36896c03251baa3d5046f4006f

See more details on using hashes here.

File details

Details for the file wnb-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: wnb-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wnb-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc8b506b039e95eea9861519cdb84ace7d9ffb45668844cb5455d96a1a96ba0d
MD5 7ff81c8491e44ee15cc9ff33a3c641bc
BLAKE2b-256 5abd501e1b8c15776046d46a6606981a55eb6c1149e2f3ba31bf16502b151b63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page