biquality-learn is a library à la scikit-learn for Biquality Learning.
Project description
biquality-learn
biquality-learn (or bqlearn in short) is a library à la scikit-learn for Biquality Learning.
Biquality Learning
Biquality Learning is a machine learning framework to train classifiers on Biquality Data, where the dataset is split into a trusted and an untrusted part:
- The trusted dataset contains trustworthy samples with clean labels and proper feature distribution.
- The untrusted dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).
biquality-learn aims at making well-known and proven biquality learning algorithms accessible and easy to use for everyone and enabling researchers to experiment in a reproducible way on biquality data.
Install
biquality-learn requires multiple dependencies:
- numpy>=1.17.3
- scipy>=1.5.0
- scikit-learn>=1.3.0
- scs>=3.2.2
The package is available on PyPi. To install biquality-learn, run the following command :
pip install biquality-learn
A dev version is available on TestPyPi :
pip install --index-url https://test.pypi.org/simple/ biquality-learn
Quick Start
For a quick example, we are going to train one of the available biquality classifiers, KPDR, on the digits dataset with synthetic asymmetric label noise.
Loading Data
First, we must load the dataset with scikit-learn and split it into a trusted and untrusted dataset.
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedShuffleSplit
X, y = load_digits(return_X_y=True)
trusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))
Simulating Label Noise
Then we generate label noise on the untrusted dataset.
from bqlearn.corruption import make_label_noise
y[untrusted] = make_label_noise(y[untrusted], "flip", noise_ratio=0.8)
Training Biquality Classifier
Finally, we train KKMM on the biquality dataset by providing the sample_quality metadata, indicating if a sample is trusted or untrusted.
from sklearn.linear_models import LogisticRegression
from bqlearn.density_ratio import KKMM
bqclf = KKMM(LogisticRegression(), kernel="rbf")
sample_quality = np.ones(X.shape[0])
sample_quality[untrusted] = 0
bqclf.fit(X, y, sample_quality=sample_quality)
bqclf.predict(X)
Citation
If you use biquality-learn in your research, please consider citing us :
@misc{nodet2023biqualitylearn,
title={biquality-learn: a Python library for Biquality Learning},
author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornuéjols},
year={2023},
eprint={2308.09643},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Acknowledgment
This work has been funded by Orange Labs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biquality-learn-0.1.0.tar.gz.
File metadata
- Download URL: biquality-learn-0.1.0.tar.gz
- Upload date:
- Size: 76.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
503d7b22551a6cdf73f8351cd40e5c52543017efaf310e2df705d4a3606dc013
|
|
| MD5 |
a09b0a89d13ca762e1dddd8fe2176b21
|
|
| BLAKE2b-256 |
6a129ecff4b397cb527772bc288d0c1d4cc72e71c8297cfb5eaa6958da2060e8
|
File details
Details for the file biquality_learn-0.1.0-py3-none-any.whl.
File metadata
- Download URL: biquality_learn-0.1.0-py3-none-any.whl
- Upload date:
- Size: 71.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
054f4370d07e17bc9033967dcf260b031b74dfa28ca37f30555b10c82748e107
|
|
| MD5 |
6f9b60be806b8a21ff59cbe58cfcfca6
|
|
| BLAKE2b-256 |
b84ab4c1b1575c89f6de93192e6ce393c3f6585d1c3aa852442ee9320770b8ec
|