Skip to main content

MGS-GRF for imbalanced-mixed-tabular data

Project description

MGS-GRF

Linting , formatting, imports sorting: ruff Pre-commit

cite

If you face imbalance data in your machine learning project, this package is here to pre-process your data. It is an efficient and ready-to-use implementation of MGS-GRF, an oversampling strategy presented at ECML-PKDD 2025 conference, designed to handle large-scale and mixed imbalanced data-set — with both continuous and categorical features.

🛠 Installation

First you can clone the repository:

git clone git@github.com:artefactory/mgs-grf.git

And install the required packages into your environment (conda, mamba or pip):

pip install -r requirements.txt

🚀 How to use the MGS-GRF Algorithm to learn on imbalanced data

Here is a short example on how to use MGS-GRF:

from mgs_grf import MGSGRFOverSampler

## Apply MGS-GRF procedure to oversample the data
mgs_grf = MGSGRFOverSampler(categorical_features=categorical_features, random_state=0)
X_train_balanced, y_train_balanced = mgs_grf.fit_resample(X_train_imbalanced, y_train_imbalanced)

## Encode the categorical variables
enc = OneHotEncoder(handle_unknown="ignore", sparse_output=False)
X_train_balanced_enc = np.hstack((X_train_balanced[:,numeric_features],
                                  enc.fit_transform(X_train_balanced[:,categorical_features])))
X_test_enc = np.hstack((X_test[:,numeric_features], enc.transform(X_test[:,categorical_features])))

# Fit the final classifier on the augmented data
clf = lgb.LGBMClassifier(n_estimators=100, verbosity=-1, random_state=0)
clf.fit(X_train_balanced_enc, y_train_balanced)

A more detailed notebook example is available in this notebook.

🙏 Acknowledgements

This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.

   

📜 Citation

If you find the code useful, please consider citing us :

@inproceedings{sakho2025harnessing,
  title={Harnessing Mixed Features for Imbalance Data Oversampling: Application to Bank Customers Scoring},
  author={Sakho, Abdoulaye and Malherbe, Emmanuel and Gauthier, Carl-Erik and Scornet, Erwan},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  pages={247--264},
  year={2025},
  organization={Springer}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mgs_grf-0.0.1.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mgs_grf-0.0.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file mgs_grf-0.0.1.tar.gz.

File metadata

  • Download URL: mgs_grf-0.0.1.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mgs_grf-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a507518d9b21fe15a22f45d293645f5ea1e9aa728a2e43a5e544eaaf23ef4501
MD5 dba7304c033bfb7d245ea7c66881fe0a
BLAKE2b-256 80145c724d71369d92926a892e73d53fc392a4653fdf0590d64586e5206ef0da

See more details on using hashes here.

File details

Details for the file mgs_grf-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mgs_grf-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mgs_grf-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d71755300c3c74a2063da48101f589a9e414394d59a3ff969976b66bbac87e89
MD5 4d9b9a20be1bf4cca5b2fc7fe7ff17f5
BLAKE2b-256 86fc82a692fba0673013d5e087dab5d0bc484f3ad7ac5e168a51b5ede5555d0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page