MGS-GRF for imbalanced-mixed-tabular data
Project description
If you face imbalance data in your machine learning project, this package is here to pre-process your data. It is an efficient and ready-to-use implementation of MGS-GRF, an oversampling strategy presented at ECML-PKDD 2025 conference, designed to handle large-scale and mixed imbalanced data-set — with both continuous and categorical features.
🛠 Installation
First you can clone the repository:
git clone git@github.com:artefactory/mgs-grf.git
And install the required packages into your environment (conda, mamba or pip):
pip install -r requirements.txt
🚀 How to use the MGS-GRF Algorithm to learn on imbalanced data
Here is a short example on how to use MGS-GRF:
from mgs_grf import MGSGRFOverSampler
## Apply MGS-GRF procedure to oversample the data
mgs_grf = MGSGRFOverSampler(categorical_features=categorical_features, random_state=0)
X_train_balanced, y_train_balanced = mgs_grf.fit_resample(X_train_imbalanced, y_train_imbalanced)
## Encode the categorical variables
enc = OneHotEncoder(handle_unknown="ignore", sparse_output=False)
X_train_balanced_enc = np.hstack((X_train_balanced[:,numeric_features],
enc.fit_transform(X_train_balanced[:,categorical_features])))
X_test_enc = np.hstack((X_test[:,numeric_features], enc.transform(X_test[:,categorical_features])))
# Fit the final classifier on the augmented data
clf = lgb.LGBMClassifier(n_estimators=100, verbosity=-1, random_state=0)
clf.fit(X_train_balanced_enc, y_train_balanced)
A more detailed notebook example is available in this notebook.
🙏 Acknowledgements
This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.
📜 Citation
If you find the code useful, please consider citing us :
@inproceedings{sakho2025harnessing,
title={Harnessing Mixed Features for Imbalance Data Oversampling: Application to Bank Customers Scoring},
author={Sakho, Abdoulaye and Malherbe, Emmanuel and Gauthier, Carl-Erik and Scornet, Erwan},
booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
pages={247--264},
year={2025},
organization={Springer}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mgs_grf-0.0.1.tar.gz.
File metadata
- Download URL: mgs_grf-0.0.1.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a507518d9b21fe15a22f45d293645f5ea1e9aa728a2e43a5e544eaaf23ef4501
|
|
| MD5 |
dba7304c033bfb7d245ea7c66881fe0a
|
|
| BLAKE2b-256 |
80145c724d71369d92926a892e73d53fc392a4653fdf0590d64586e5206ef0da
|
File details
Details for the file mgs_grf-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mgs_grf-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d71755300c3c74a2063da48101f589a9e414394d59a3ff969976b66bbac87e89
|
|
| MD5 |
4d9b9a20be1bf4cca5b2fc7fe7ff17f5
|
|
| BLAKE2b-256 |
86fc82a692fba0673013d5e087dab5d0bc484f3ad7ac5e168a51b5ede5555d0f
|