A Python package for imbalanced learning with ensemble learning.

These details have not been verified by PyPI

Project links

Project description

Feature - Imbalanced Datasets GitHub last commit GitHub issues GitHub stars Python Version Read the Docs

A rich documentation is available at Read the Docs

Shape Penalized Decision Forests

Shape Penalized Decision Forests, for training ensemble classifiers tailored for imbalanced datasets. This package provides two primary implementations:

SPBoDF (Shape Penalized Boosting Decision Forest): A boosting ensemble method that builds multiple trees sequentially, adjusting the weights of samples to focus on harder-to-classify instances.
SPBaDF (Shape Penalized Bagging Decision Forest): A bagging ensemble method that builds multiple trees independently on bootstrap samples of the dataset, improving robustness and reducing overfitting.

Both implementations use the concept of Surface-to-Volume Regularization (SVR) to penalize irregular decision boundaries, thus improving generalization and addressing challenges associated with imbalanced datasets.

Key Features

Boosting and Bagging: Two ensemble approaches tailored for classification tasks.
Shape Penalization: Incorporates a novel regularization technique to control decision boundary complexity.
Imbalanced Data Handling: Designed with class imbalance in mind, using weighting and bootstrapping techniques.
Scikit-learn Compatible: Implements BaseEstimator and ClassifierMixin, making it seamlessly integrable with the Scikit-learn ecosystem.
Customizability: Hyperparameters such as the number of trees, shape penalty, and maximal leaves are configurable for fine-tuning.

Installation

Downloading Locally and Installing

git clone https://www.github.com/yuvrajiro/imbalanced-spdf.git
cd imbalance-spdf

Install dependencies:
```
pip install -r requirements.txt
```
Install the package:
```
python install -e .
```

Using pip install from GitHub

pip install git+https://www.github.com/yuvrajiro/imbalanced-spdf.git

Using pip install from PyPi
```
 pip install imbalanced-spdf
```

Usage

1. SPBoDF (Boosting)

Example

import numpy as np
from imbalanced_spdf.ensemble import SPBoDF

# Generate synthetic data
X_train = np.random.rand(100, 5)
y_train = np.random.randint(0, 2, 100)

# Initialize and fit SPBoDF
boosting_model = SPBoDF(n_trees=50, weight=2, pen=1.0, random_state=42)
boosting_model.fit(X_train, y_train)

# Predict
X_test = np.random.rand(20, 5)
y_pred = boosting_model.predict(X_test)
print("Predictions:", y_pred)

2. SPBaDF (Bagging)

Example

from imbalanced_spdf.ensemble import SPBaDF

# Generate synthetic data
X_train = np.random.rand(100, 5)
y_train = np.random.randint(0, 2, 100)

# Initialize and fit SPBaDF
bagging_model = SPBaDF(n_trees=50, weight=2, pen=1.0, random_state=42)
bagging_model.fit(X_train, y_train)

# Predict
X_test = np.random.rand(20, 5)
y_pred = bagging_model.predict(X_test)
print("Predictions:", y_pred)

API Reference

`SPBoDF`

A boosting ensemble classifier that uses SVR-regularized trees to handle imbalanced datasets.

Parameters:

n_trees (int): Number of trees in the ensemble (default: 40).
weight (float): Weight for the minority class to address imbalance (default: 1).
pen (float): Regularization penalty controlling decision boundary complexity (default: 0).
maximal_leaves (int or float, optional): Maximum leaves per tree. Defaults to 2 * sqrt(n_samples) * 0.3333.
random_state (int): Random seed for reproducibility (default: 23).

Methods:

fit(X, y): Fits the ensemble on the training data.
predict(X): Predicts the labels of the test data.

`SPBaDF`

A bagging ensemble classifier that uses SVR-regularized trees to improve robustness.

Parameters:

n_trees (int): Number of trees in the ensemble (default: 40).
weight (float): Weight for the minority class to address imbalance (default: 1).
pen (float): Regularization penalty controlling decision boundary complexity (default: 0).
maximal_leaves (int or float, optional): Maximum leaves per tree. Defaults to 2 * sqrt(n_samples) * 0.3333.
random_state (int): Random seed for reproducibility (default: 23).

Methods:

fit(X, y): Fits the ensemble on the training data.
predict(X): Predicts the labels of the test data.

How It Works

SPBoDF (Boosting):
- Trees are built sequentially, with sample weights updated after each iteration to focus on misclassified samples.
- Regularization (SVR) penalizes irregular decision boundaries to avoid overfitting.
SPBaDF (Bagging):
- Trees are built independently on bootstrap samples of the training data.
- Each tree focuses on non-constant feature subsets, improving robustness and generalization.

Dataset Details

Dataset	Available at	Comments (if any)
Appendicitis	https://github.com/ZixiaoShen/Datasets/blob/master/UCI/C2_F7_S106_Appendicitis/Appendicitis.csv	-----------
Data User Modelling	http://archive.ics.uci.edu/ml/machine-learning-databases/00257/Data_User_Modeling_Dataset_Hamdi%20Tolga%20KAHRAMAN.xls	-------------
Ecoli	https://raw.githubusercontent.com/jbrownlee/Datasets/master/ecoli.csv	'pp' is considered as class 1 and cp, im, om, omL, imL, imU as class 0
Ecoli-0-6-7-vs-5	https://github.com/w4k2/umce/blob/master/datasets/imb_IRhigherThan9p2/ecoli-0-6-7_vs_5/ecoli-0-6-7_vs_5.dat	------
Estate	https://github.com/MKLab-ITI/Posterior-Rebalancing/blob/1a0b561e6418e9df25a75006206598bff2babe2c/data/hddt/imbalanced/estate.data#L4	------
Fertility Diagonosis	https://archive.ics.uci.edu/ml/machine-learning-databases/00244/fertility_Diagnosis.txt	------
Imbalance-scale	https://archive.ics.uci.edu/ml/machine-learning-databases/balance-scale/balance-scale.data	'B' is conisdered as 1, otherwise 0
Oil	https://github.com/MKLab-ITI/Posterior-Rebalancing/blob/1a0b561e6418e9df25a75006206598bff2babe2c/data/hddt/imbalanced/oil.data	------
Page-blocks0	https://github.com/w4k2/DSE/blob/ac0e824d3a7507fe9d57356150cef0def5c4a36d/streams/real/page-blocks0.arff#L4	------
Winequality-red	https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv	-------
Yeast-0-3-5-9-vs-7-8	https://github.com/w4k2/DSE/blob/ac0e824d3a7507fe9d57356150cef0def5c4a36d/streams/real/yeast-0-3-5-9_vs_7-8.arff	------
car-vgood	https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data	0 if 'negative' otherwise 1
cleveland_0_vs_4	https://github.com/Jaga7/Metody-Sztucznej-Inteligencji/blob/d46ae0c897b5524d5e0c9b9b800e190b9727fd52/PROJEKT/datasets/cleveland_0_vs_4.csv#L4	------
haberman	https://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
led7digit-0-2-4-5-6-7-8-9_vs_1	https://github.com/ikurek/PWr-Uczenie-Maszyn/blob/f9561a959c49229f22e489b17ccb23b52e99d2a2/data/led7digit-0-2-4-5-6-7-8-9_vs_1.dat#L4	------
new-thyroid1	https://github.com/jamesrobertlloyd/dataset-space/blob/d195fd8748ba8def627ae2e727395aee608952ec/data/class/raw/keel/new-thyroid1.dat#L4	------
page-blocks-1-3_vs_4	https://github.com/ikurek/PWr-Uczenie-Maszyn/blob/f9561a959c49229f22e489b17ccb23b52e99d2a2/data/page-blocks-1-3_vs_4.dat	------
shuttle-c0-vs-c4	https://github.com/ikurek/PWr-Uczenie-Maszyn/blob/f9561a959c49229f22e489b17ccb23b52e99d2a2/data/shuttle-c0-vs-c4.dat	------
vehicle3	https://github.com/jamesrobertlloyd/dataset-space/blob/d195fd8748ba8def627ae2e727395aee608952ec/data/class/raw/keel/vehicle3.dat	------
yeast-2_vs_8	https://github.com/jamesrobertlloyd/dataset-space/blob/d195fd8748ba8def627ae2e727395aee608952ec/data/class/raw/keel/yeast-2_vs_8.dat	------

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you are using this package in your research, please consider citing the following paper:

Shape Penalized Decision Forests for Imbalanced Data Classification : Rahul Goswami, Aindrila Garai, Payel Sadhukhan, Palash Ghosh, Tanujit Chakraborty

References

Zhu, Y., Li, C., & Dunson, D. B. (2023). "Classification Trees for Imbalanced Data: Surface-to-Volume Regularization." Journal of the American Statistical Association.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.18

Jan 31, 2025

0.0.17

Jan 31, 2025

0.0.16

Jan 31, 2025

0.0.15

Jan 31, 2025

0.0.14

Jan 31, 2025

0.0.13

Jan 31, 2025

0.0.12

Jan 31, 2025

0.0.11

Jan 31, 2025

0.0.1

Jan 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imbalanced_spdf-0.0.18.tar.gz (20.5 kB view details)

Uploaded Jan 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

imbalanced_spdf-0.0.18-py3-none-any.whl (17.6 kB view details)

Uploaded Jan 31, 2025 Python 3

File details

Details for the file imbalanced_spdf-0.0.18.tar.gz.

File metadata

Download URL: imbalanced_spdf-0.0.18.tar.gz
Upload date: Jan 31, 2025
Size: 20.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for imbalanced_spdf-0.0.18.tar.gz
Algorithm	Hash digest
SHA256	`e90ca8c1bf7160b79ff0fd5ae227e9bb09c023f70b70674c6e9be01e636203b3`
MD5	`a5082d0b51fe1b7f850c13fe5b4b77ef`
BLAKE2b-256	`01e6022380d1b0153b4a5e7d61f0ab26d100b799707db0e93e9320825c35702e`

See more details on using hashes here.

File details

Details for the file imbalanced_spdf-0.0.18-py3-none-any.whl.

File metadata

Download URL: imbalanced_spdf-0.0.18-py3-none-any.whl
Upload date: Jan 31, 2025
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for imbalanced_spdf-0.0.18-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ed7170083d4d21c3a133df42b1970d4be847783cca1fcc257d128afa8374e52`
MD5	`8eca055eb60bffcd39e92a613bf330fa`
BLAKE2b-256	`6bf9ea67e92983039eb7bdfc88460dd6a555227d656ccb57b045e103f3158264`

See more details on using hashes here.

imbalanced-spdf 0.0.18

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Shape Penalized Decision Forests

Key Features

Installation

Usage

1. SPBoDF (Boosting)

Example

2. SPBaDF (Bagging)

Example

API Reference

SPBoDF

Parameters:

Methods:

SPBaDF

Parameters:

Methods:

How It Works

Dataset Details

License

Citation

References

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SPBoDF`

`SPBaDF`