Instance-complexity based cost-sensitive learning
Project description
iCost
iCost is a Python library for instance-level cost-sensitive learning, fully compatible with scikit-learn. It extends traditional cost-sensitive classification by dynamically adjusting sample costs based on instance complexity. Multiple strategies have been incorporated into the algorithm, and it works with any scikit-learn classifier that supports sample_weight.
Requirements:
Key Features:
-
Support for any scikit-learn compatible classifier as the base model.
-
Multiple strategies for cost-sensitive learning:
-- ncs → no cost (baseline).
-- org → original sklearn-style cost-sensitive (all minority weighted by imbalance ratio).
-- mst → MST-based linked vs. pure minority categorization.
-- neighbor → neighbor-based categorization with three sub-modes.
-
Neighbor-based categorization (5-NN):
-- Mode 1 → safe, pure, border.
-- Mode 2 → safe, border, outlier.
-- Mode 3 → fine-grained categories g1–g6 with user-defined penalties.
-
Utility function: categorize_minority_class for direct analysis of minority-class samples.
Synopsis
The standard weighted classifier applies an increased weight to all the minority class misclassifications in imbalanced classification tasks. This approach is available in the standard implementation of the sklearn library.
However, there is an issue. Should the same weight be applied to all the minority class samples indiscriminately? Some minority class samples are closer to the decision boundary (difficult to identify), while some samples are far way from the border (easy to classify). There are also some instances that are noisy, completely surrounded by instances from the majority class. Now, applying the same higher misclassification cost to all the minority-class samples is unjustifiable. It distorts the decision boundary significantly, resulting in more misclassifications.
The proposed solution is to apply the cost to only certain samples or apply different costs depending on their level of difficulty. This improves the prediction performance in different imbalanced scenarios.
For more information, please refer to the following paper:
Paper
arxiv: https://doi.org/10.48550/arXiv.2409.13007
The paper is currently under review.
Installation
pip install icost
Usage Example
from icost import iCost, categorize_minority_class
from sklearn.svm import SVC
# Example with neighbor-mode cost assignment
clf = iCost(
base_classifier=SVC(kernel="rbf", probability=True),
method="neighbor",
neighbor_mode=2 # Mode 1, 2, or 3
)
clf.fit(X_train, y_train)
print("Test Accuracy:", clf.score(X_test, y_test))
# Example with mode=3 (custom penalties for g1..g6)
clf3 = iCost(
base_classifier=SVC(),
method="neighbor",
neighbor_mode=3,
neighbor_costs=[1.0, 2.0, 5.0, 5.0, 3.0, 1.0] # g1..g6
)
clf3.fit(X_train, y_train)
Helper Function
You can analyze minority samples directly with:
import pandas as pd
from icost import categorize_minority_class
df = pd.read_csv("your_dataset.csv")
min_idx, groups, opp_counts = categorize_minority_class(
df,
minority_label=1,
mode=1,
show_summary=True
)
Output:
Category summary (minority samples):
safe: 45
pure: 28
border: 62
Structure
icost/
├── __init__.py # Makes icost a package; exposes iCost and helpers
├── __version__.py # Stores the package version (e.g., 0.1.0)
├── icost.py # Main iCost class (methods: ncs, org, mst, neighbor)
├── mst_linked_ind.py # MST-based helper:
│ # - Identifies 'linked' vs 'pure' minority samples
│ # - Used for MST variant of iCost
└── categorize_minority_v2.py # Neighbor-based helper:
# - Categorizes minority samples with 5-NN
# - Supports modes (safe, pure, border, outlier, g1–g6)
# - Provides summary statistics
Other files in the repo
- README.md → Documentation and usage instructions.
- LICENSE → Project license (MIT by default).
- pyproject.toml → Build configuration for packaging and PyPI upload.
- icost_usage_example → tests to check functionality.
Screenshots
BibTex Citation
If you plan to use this module, please cite the paper:
@misc{newaz2024icostnovelinstancecomplexity,
title={iCost: A Novel Instance Complexity Based Cost-Sensitive Learning Framework for Imbalanced Classification},
author={Asif Newaz and Asif Ur Rahman Adib and Taskeed Jabid},
year={2024},
eprint={2409.13007},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2409.13007},
}
License
This project is licensed under the MIT License.
Note
The work is currently being updated to include additional features, which I plan to incorporate soon.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file icost-0.1.1.tar.gz.
File metadata
- Download URL: icost-0.1.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a45a7bad8676e2d2630f34d2baaf21c02a1bfe6aa6d592a257dd8dad2a365bbd
|
|
| MD5 |
c3c0d61c7a38f35cedf772700c61883f
|
|
| BLAKE2b-256 |
4c5b8e948b64560d9173637fdb5146c71e8ac5fc326541e054f104a6b9f0473d
|
File details
Details for the file icost-0.1.1-py3-none-any.whl.
File metadata
- Download URL: icost-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b19cbd94b77f8f6da454916db9057ea893af865c88d802b5350e37d7da0cb14
|
|
| MD5 |
7494a14650d016635bf0cdc287314a81
|
|
| BLAKE2b-256 |
755b093ec1dd14764f03e83f74bdc6a54c305896fcb297b0d51c697161f0b05d
|