Instance-complexity-aware cost-sensitive learning for imbalanced classification
Project description
iCost
iCost is a scikit-learn-compatible Python package for instance-complexity-aware cost-sensitive learning in imbalanced classification.
Traditional cost-sensitive learning usually assigns the same penalty to every minority-class sample. However, minority samples are not equally difficult to classify. Some are clearly separable, some are safe but near the class boundary, some lie in overlapping regions, and some may be noisy or outlier-like. iCost addresses this limitation by assigning adaptive penalties to minority-class instances according to their estimated learning difficulty.
The package implements two main variants:
- Neighbor-iCost: estimates minority-instance complexity using local neighborhood composition.
- Gini-iCost: estimates minority-instance complexity using Gini-impurity-based feature-space partitioning with a shallow decision-tree probe.
The framework works with standard classifiers that support sample_weight, including Logistic Regression, SVM, Decision Tree, Random Forest, and XGBoost.
Installation
Install from PyPI:
pip install icost
Or install the latest version from GitHub:
pip install git+https://github.com/newaz-aa/iCost.git
Key Features
- Instance-complexity-aware cost-sensitive learning
- Compatible with the scikit-learn estimator interface
- Works with classifiers that support
sample_weight - Supports binary imbalanced classification
- Can be extended to multiclass problems using one-vs-rest decomposition
- Does not generate synthetic samples or remove existing samples
- Provides adaptive penalties for minority-class instances based on learning difficulty
Supported Training Modes
| Mode | Description |
|---|---|
ncs |
Non-cost-sensitive baseline. All samples receive equal weight. |
cs |
Conventional cost-sensitive learning. All minority samples receive the same IR-based weight. |
neighbor |
Neighbor-iCost. Minority samples are categorized using local neighborhood composition. |
tree |
Gini-iCost. Minority samples are weighted using Gini-impurity-based feature-space partitioning. |
gini |
Alias for tree. |
Default Cost Hierarchy
The default iCost penalties are defined relative to the imbalance ratio, IR:
| Minority-instance type | Symbol | Default penalty |
|---|---|---|
| Outlier-like/noisy | cfo |
0.10 × IR |
| Pure/easy | cfp |
0.30 × IR |
| Safe | cfs |
0.75 × IR |
| Border/overlapping | cfb |
1.00 × IR |
Thus, the intended hierarchy is:
cfo < cfp < cfs < cfb = IR
This allows iCost to emphasize informative boundary samples while reducing unnecessary over-penalization of easy or potentially noisy minority samples.
Basic Usage
Neighbor-iCost
from icost import iCost
from sklearn.linear_model import LogisticRegression
model = iCost(
base_classifier=LogisticRegression(max_iter=1000),
method="neighbor"
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Gini-iCost
from icost import iCost
from xgboost import XGBClassifier
model = iCost(
base_classifier=XGBClassifier(),
method="tree",
tree_max_depth=3,
tree_min_samples_leaf=5,
tau_pure=0.80,
tau_out=0.20
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Conventional Cost-Sensitive Learning Baseline
To use standard class-level cost-sensitive learning:
from icost import iCost
from sklearn.svm import SVC
model = iCost(
base_classifier=SVC(),
method="cs"
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
In this mode, all minority-class samples receive the same imbalance-ratio-based penalty.
Custom Cost Values
You can manually set the penalty multipliers:
model = iCost(
base_classifier=LogisticRegression(max_iter=1000),
method="neighbor",
cfo=0.10,
cfp=0.30,
cfs=0.75,
cfb=1.25,
scale_costs_by_ir=True
)
When scale_costs_by_ir=True, the values are multiplied by the imbalance ratio.
Example: Multiclass Extension
iCost can be used for multiclass imbalanced classification through one-vs-rest decomposition:
from icost import iCost
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
base_model = iCost(
base_classifier=LogisticRegression(max_iter=1000),
method="neighbor"
)
model = OneVsRestClassifier(base_model)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Requirements
- Python >= 3.8
- numpy
- pandas
- scikit-learn
Project Structure
icost/
├── __init__.py
├── __version__.py
├── icost.py # Main iCost estimator
└── categorize_minority_v2.py # Helper functions for minority-instance analysis
Method Overview
Neighbor-iCost
Neighbor-iCost estimates minority-instance complexity using local neighborhood composition. Minority samples are categorized as pure, safe, border, or outlier-like based on the number of majority-class samples among their nearest neighbors.
Gini-iCost
Gini-iCost uses a shallow decision-tree probe to partition the feature space. The class distribution and Gini impurity of the leaf containing each minority sample are then used to estimate regional ambiguity and assign adaptive penalties.
Research Paper
This package supports the implementation of the following manuscript:
iCost: A Novel Instance-Complexity-Based Cost-Sensitive Learning Framework
The manuscript is currently submitted to Machine Learning with Applications.
BibTex Citation
If you plan to use this module, please cite the paper:
@misc{newaz2024icostnovelinstancecomplexity,
title={iCost: A Novel Instance Complexity Based Cost-Sensitive Learning Framework for Imbalanced Classification},
author={Asif Newaz and Asif Ur Rahman Adib and Taskeed Jabid},
year={2024},
eprint={2409.13007},
archivePrefix={arXiv},
primaryClass={cs.LG},
doi= {https://doi.org/10.48550/arXiv.2409.13007},
url={https://arxiv.org/abs/2409.13007},
}
License
This project is licensed under the MIT License.
Author
Asif Newaz
Assistant Professor
Department of Electrical and Electronic Engineering
Islamic University of Technology (IUT), Gazipur, Bangladesh
Email: eee.asifnewaz@iut-dhaka.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icost-0.2.0.tar.gz.
File metadata
- Download URL: icost-0.2.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fba641ce6df5b97e3f1a982d2826ff0f25157b818285e5843299ddb8481a411
|
|
| MD5 |
7972dd706d13494351f18970689011fe
|
|
| BLAKE2b-256 |
9509eaee2564f74bc906456af46748ef4afcab3e12b386514d0fe21187c389e8
|
File details
Details for the file icost-0.2.0-py3-none-any.whl.
File metadata
- Download URL: icost-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ed1599b79784a857f7dd8a44402b843d39f5daf4615f0f6447e141b6a275660
|
|
| MD5 |
c94d9e025b5cffe8b77b7e0f83ab99da
|
|
| BLAKE2b-256 |
37dc46a540115b52b04fa2a83ffea05214a7074b613808f8060eafe80a4affa2
|