Skip to main content

Label Noise Correction Methods

Project description

Label Noise Correction Methods

This Python package provides an implementation of label noise correction algorithms proposed in the literature. These algorithms aim to mitigate the effects of label noise in supervised learning tasks by correcting the noisy labels. The methods were implemented for binary classification tasks.

Installation

You can install the package using pip:

pip install label_noise_correction

Algorithms

The package currently includes the following label noise correction algorithms:

  • Bayesian Entropy Noise Correction (BE) [1]
  • Polishing Labels (PL) [2]
  • Self-Training Correction (STC) [2]
  • Clustering-Based Correction (CC) [2]
  • Ordering-Based Noise Correction (OBNC) [3]
  • Hybrid Label Noise Correction (HLNC) [4]
  • Fair Ordering-Based Noise Correction (Fair-OBNC) [5]

Usage

Here's an example of how to use the package to apply label noise correction:

from label_noise_correction import AlgorithmA
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X = pd.DataFrame(X)
y = pd.Series(y)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Introduce label noise to create a noisy dataset
y_noisy = y_train.copy()
for i in y_noisy.index:
    if random.random() < 0.1:
        y_noisy.loc[i] = 1 - y_noisy.loc[i]

# Apply label noise correction
lnc = PolishingLabels(LogisticRegression, 10)
y_corrected = lnc.correct(X_train, y_noisy)

# Train models on the noisy and corrected labels
model = LogisticRegression()
model.fit(X_train, y_noisy)
y_pred_noisy = model.predict(X_test)

model.fit(X_train, y_corrected)
y_pred_corrected = model.predict(X_test)

# Evaluate accuracy before and after correction
accuracy = accuracy_score(y_test, y_pred_noisy)
print("Accuracy before label noise correction:", accuracy)

accuracy = accuracy_score(y_test, y_pred_corrected)
print("Accuracy after label noise correction:", accuracy)

References

  1. Sun, Jiang-wen, et al. "Identifying and correcting mislabeled training instances." Future generation communication and networking (FGCN 2007). Vol. 1. IEEE, 2007.
  2. Nicholson, Bryce, et al. "Label noise correction methods." 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2015.
  3. Feng, Wei, and Samia Boukir. "Class noise removal and correction for image classification using ensemble margin." 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015.
  4. Xu, Jiwei, Yun Yang, and Po Yang. "Hybrid label noise correction algorithm for medical auxiliary diagnosis." 2020 IEEE 18th International Conference on Industrial Informatics (INDIN). Vol. 1. IEEE, 2020.

Contributing

Contributions to this package are welcome! If you have any bug reports, feature requests, or would like to contribute with code improvements, please submit an issue or a pull request on the GitHub repository.

License

This package is distributed under the MIT License.


Feel free to modify and expand upon this README.md template according to your specific package and the algorithms you implement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

label_noise_correction-0.0.4.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

label_noise_correction-0.0.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file label_noise_correction-0.0.4.tar.gz.

File metadata

  • Download URL: label_noise_correction-0.0.4.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for label_noise_correction-0.0.4.tar.gz
Algorithm Hash digest
SHA256 fbcf04e2125cff1df96c8afe5bdd0e061f1bf696847c711fbf19dedb9a6627f7
MD5 ccb791dbc619f59c58a9873bbeab4c2e
BLAKE2b-256 45cd29d926f83ba50287aed09e2c25ad96dab5d5a07ce7efc40f6dfc0ac0e3f0

See more details on using hashes here.

File details

Details for the file label_noise_correction-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for label_noise_correction-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9752d1716cfb22da549d5430ab84ebb56bce4362621526d00359392b441f3549
MD5 60efe0acd392fda5643a6e3950b18208
BLAKE2b-256 8ef82686a6697567a238af85d852cb09b4f835594b888e24215911390780db8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page