Skip to main content

Instance-complexity based cost-sensitive learning

Project description

iCost

iCost is a Python library for instance-level cost-sensitive learning, fully compatible with scikit-learn. It extends traditional cost-sensitive classification by dynamically adjusting sample costs based on instance complexity. Multiple strategies have been incorporated into the algorithm, and it works with any scikit-learn classifier that supports sample_weight.

Requirements:

Python scikit-learn numpy pandas Seaborn Matplotlib

Key Features:

  • Support for any scikit-learn compatible classifier as the base model.

  • Multiple strategies for cost-sensitive learning:

    -- ncs → no cost (baseline).

    -- org → original sklearn-style cost-sensitive (all minority weighted by imbalance ratio).

    -- mst → MST-based linked vs. pure minority categorization.

    -- neighbor → neighbor-based categorization with three sub-modes.

  • Neighbor-based categorization (5-NN):

    -- Mode 1 → safe, pure, border.

    -- Mode 2 → safe, border, outlier.

    -- Mode 3 → fine-grained categories g1–g6 with user-defined penalties.

  • Utility function: categorize_minority_class for direct analysis of minority-class samples.

Synopsis

The standard weighted classifier applies an increased weight to all the minority class misclassifications in imbalanced classification tasks. This approach is available in the standard implementation of the sklearn library.

However, there is an issue. Should the same weight be applied to all the minority class samples indiscriminately? Some minority class samples are closer to the decision boundary (difficult to identify), while some samples are far way from the border (easy to classify). There are also some instances that are noisy, completely surrounded by instances from the majority class. Now, applying the same higher misclassification cost to all the minority-class samples is unjustifiable. It distorts the decision boundary significantly, resulting in more misclassifications.

The proposed solution is to apply the cost to only certain samples or apply different costs depending on their level of difficulty. This improves the prediction performance in different imbalanced scenarios.

For more information, please refer to the following paper:

Paper

arxiv: https://doi.org/10.48550/arXiv.2409.13007

The paper is currently under review.

Installation

PyPI version

pip install icost

Usage Example

from icost import iCost, categorize_minority_class
from sklearn.svm import SVC

# Example with neighbor-mode cost assignment
clf = iCost(
    base_classifier=SVC(kernel="rbf", probability=True),
    method="neighbor",
    neighbor_mode=2          # Mode 1, 2, or 3
)

clf.fit(X_train, y_train)
print("Test Accuracy:", clf.score(X_test, y_test))

# Example with mode=3 (custom penalties for g1..g6)
clf3 = iCost(
    base_classifier=SVC(),
    method="neighbor",
    neighbor_mode=3,
    neighbor_costs=[1.0, 2.0, 5.0, 5.0, 3.0, 1.0]  # g1..g6
)
clf3.fit(X_train, y_train)

Helper Function

You can analyze minority samples directly with:

import pandas as pd
from icost import categorize_minority_class

df = pd.read_csv("your_dataset.csv")
min_idx, groups, opp_counts = categorize_minority_class(
    df,
    minority_label=1,
    mode=1,
    show_summary=True
)

Output:

Category summary (minority samples):
  safe: 45
  pure: 28
  border: 62

Structure

icost/
├── __init__.py               # Makes icost a package; exposes iCost and helpers
├── __version__.py            # Stores the package version (e.g., 0.1.0)
├── icost.py                  # Main iCost class (methods: ncs, org, mst, neighbor)
├── mst_linked_ind.py         # MST-based helper:
│                             #   - Identifies 'linked' vs 'pure' minority samples
│                             #   - Used for MST variant of iCost
└── categorize_minority_v2.py # Neighbor-based helper:
                              #   - Categorizes minority samples with 5-NN
                              #   - Supports modes (safe, pure, border, outlier, g1–g6)
                              #   - Provides summary statistics

Other files in the repo

  • README.md → Documentation and usage instructions.
  • LICENSE → Project license (MIT by default).
  • pyproject.toml → Build configuration for packaging and PyPI upload.
  • icost_usage_example → tests to check functionality.

Screenshots

App Screenshot

App Screenshot

BibTex Citation

If you plan to use this module, please cite the paper:

@misc{newaz2024icostnovelinstancecomplexity,
      title={iCost: A Novel Instance Complexity Based Cost-Sensitive Learning Framework for Imbalanced Classification}, 
      author={Asif Newaz and Asif Ur Rahman Adib and Taskeed Jabid},
      year={2024},
      eprint={2409.13007},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2409.13007}, 
}

License

This project is licensed under the MIT License.

Note

The work is currently being updated to include additional features, which I plan to incorporate soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icost-0.1.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

icost-0.1.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file icost-0.1.1.tar.gz.

File metadata

  • Download URL: icost-0.1.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for icost-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a45a7bad8676e2d2630f34d2baaf21c02a1bfe6aa6d592a257dd8dad2a365bbd
MD5 c3c0d61c7a38f35cedf772700c61883f
BLAKE2b-256 4c5b8e948b64560d9173637fdb5146c71e8ac5fc326541e054f104a6b9f0473d

See more details on using hashes here.

File details

Details for the file icost-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: icost-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for icost-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2b19cbd94b77f8f6da454916db9057ea893af865c88d802b5350e37d7da0cb14
MD5 7494a14650d016635bf0cdc287314a81
BLAKE2b-256 755b093ec1dd14764f03e83f74bdc6a54c305896fcb297b0d51c697161f0b05d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page