Skip to main content

Python library for the implementations of general and weighted naive Bayes (WNB) classifiers.

Project description

wnb logo
General and weighted naive Bayes classifiers
Scikit-learn-compatible

Lastest Release PyPI Version Python Versions
GitHub Workflow Status (build) Coverage PyPI License PyPi Downloads

Introduction

Naive Bayes is a widely used classification algorithm known for its simplicity and efficiency. This package takes naive Bayes to a higher level by providing more flexible and weighted variants, making it suitable for a broader range of applications.

General naive Bayes

Most standard implementations, such as those in sklearn.naive_bayes, assume a single distribution type for all feature likelihoods. This can be restrictive when dealing with mixed data types. WNB overcomes this limitation by allowing users to specify different probability distributions for each feature individually. You can select from a variety of continuous and discrete distributions, enabling greater customization and improved model performance.

Weighted naive Bayes

While naive Bayes is simple and interpretable, its conditional independence assumption often fails in real-world scenarios. To address this, various attribute-weighted naive Bayes methods exist, but most are computationally expensive and lack mechanisms for handling class imbalance.

WNB package provides an optimized implementation of Minimum Log-likelihood Difference Wighted Naive Bayes (MLD-WNB), a novel approach that optimizes feature weights using the Bayes optimal decision rule. It also introduces hyperparameters for controlling model bias, making it more robust for imbalanced classification.

Installation

This library is shipped as an all-in-one module implementation with minimalistic dependencies and requirements. Furthermore, it fully adheres to Scikit-learn API ❤️.

Prerequisites

Ensure that Python 3.8 or higher is installed on your machine before installing WNB.

PyPi

pip install wnb

uv

uv add wnb

Getting started ⚡️

Here, we show how you can use the library to train general (mixed) and weighted naive Bayes classifiers.

General naive Bayes

A general naive Bayes model can be set up and used in four simple steps:

  1. Import the GeneralNB class as well as Distribution enum class
from wnb import GeneralNB, Distribution as D
  1. Initialize a classifier with likelihood distributions specified
clf = GeneralNB([D.NORMAL, D.CATEGORICAL, D.EXPONENTIAL, D.EXPONENTIAL])

or

# Columns not explicitly specified will default to Gaussian (normal) distribution
clf = GeneralNB(
    distributions=[
        (D.CATEGORICAL, [1]),
        (D.EXPONENTIAL, ["col3", "col4"]),
    ],
)
  1. Fit the classifier to a training set (with four features)
clf.fit(X_train, y_train)
  1. Predict on test data
clf.predict(X_test)

Weighted naive Bayes

An MLD-WNB model can be set up and used in four simple steps:

  1. Import the GaussianWNB class
from wnb import GaussianWNB
  1. Initialize a classifier
clf = GaussianWNB(max_iter=25, step_size=1e-2, penalty="l2")
  1. Fit the classifier to a training set
clf.fit(X_train, y_train)
  1. Predict on test data
clf.predict(X_test)

Compatibility with Scikit-learn 🤝

The wnb library fully adheres to the Scikit-learn API, ensuring seamless integration with other Scikit-learn components and workflows. This means that users familiar with Scikit-learn will find the WNB classifiers intuitive to use.

Both Scikit-learn classifiers and WNB classifiers share these well-known methods:

  • fit(X, y)
  • predict(X)
  • predict_proba(X)
  • predict_log_proba(X)
  • predict_joint_log_proba(X)
  • score(X, y)
  • get_params()
  • set_params(**params)
  • etc.

By maintaining this consistency, WNB classifiers can be easily incorporated into existing machine learning pipelines and processes.

Benchmarks 📊

We conducted benchmarks on four datasets, Wine, Iris, Digits, and Breast Cancer, to evaluate the performance of WNB classifiers and compare them with their Scikit-learn counterpart, GaussianNB. The results show that WNB classifiers generally perform better in certain cases.

Dataset Scikit-learn Classifier Accuracy WNB Classifier Accuracy
Wine GaussianNB 0.9749 GeneralNB 0.9812
Iris GaussianNB 0.9556 GeneralNB 0.9602
Digits GaussianNB 0.8372 GeneralNB 0.8905
Breast Cancer GaussianNB 0.9389 GaussianWNB 0.9519

These benchmarks highlight the potential of WNB classifiers to provide better performance in certain scenarios by allowing more flexibility in the choice of distributions and incorporating weighting strategies.

The scripts used to generate these benchmark results are available in the tests/benchmarks/ directory.

Support us 💡

You can support the project in the following ways:

⭐ Star WNB on GitHub (click the star button in the top right corner)

💡 Provide your feedback or propose ideas in the Issues section

📰 Post about WNB on LinkedIn or other platforms

Citation 📚

If you utilize this repository, please consider citing it with:

@misc{wnb,
  author = {Mohammd Mehdi Samsami},
  title = {WNB: General and weighted naive Bayes classifiers},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/msamsami/wnb}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wnb-0.8.1.tar.gz (132.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wnb-0.8.1-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file wnb-0.8.1.tar.gz.

File metadata

  • Download URL: wnb-0.8.1.tar.gz
  • Upload date:
  • Size: 132.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for wnb-0.8.1.tar.gz
Algorithm Hash digest
SHA256 f265123b2d929136061b2ba64b29375886fdbf17f7c78e114a50fcae9b442ae9
MD5 60cca03ba6e424f35812ba60a38261ab
BLAKE2b-256 52861064fdd5ede526cd6b6a1c92bc4502103deb8f84d89fd600389460e5809f

See more details on using hashes here.

File details

Details for the file wnb-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: wnb-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for wnb-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5e838233bb8f491cd2e5c95d2ab3a60816cd8835b7843a106c8ad043567ad05e
MD5 e1c74504ad6de228ca6ebbe0cd1d794e
BLAKE2b-256 4585932eab447326aa767596006b1dc1f9e051319d4552cd334127b4de928157

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page