Skip to main content

mdatagen: A Python Library for the Generation of Artificial Missing Data

Project description

mdatagen: A Python Library for the Generation of Artificial Missing Data

License Documentation Version

This package has been developed to address a gap in machine learning research, specifically the artificial generation of missing data. Santos et al. (2019) provided a survey that presents various strategies for both univariate and multivariate scenarios, but the Python community still needs implementations of these strategies. Besides, Pereira et al. (2023) proposed new benchmark strategies for Missing Not At Random (MNAR), and these novel methods also need to be implemented in Python. Hence, missing-data-generator (mdatagen) is a Python package that implements methods for generating missing values ​​for data, including Missing At Random (MAR), Missing Not At Random (MNAR), and Missing Completly At Random (MCAR) mechanisms in both univariate and multivariate scenarios.

This Python package is a collaboration between researchers at the Aeronautics Institute of Technologies (Brazil) and the University of Coimbra (Portugal).

User Guide

Please refer to the univariate docs or multivariate docs for more details.

Installation

To install the package, please use the pip installation as follows:

pip install mdatagen

Usage examples

For examples on how to use the mdatagen package, from basic examples that generate artificial missing data under a mechanism to complete examples using Multiple Imputation by Chained Equations (MICE) from scikit-learn for imputation, follow this examples.

Contribuitions

Contributions are welcome! Feel free to open issues, submit pull requests, or provide feedback.

Citation

If you use mdatagen in your research, please cite the mdatagen paper

Bibtex entry:

@article{mdatagen2024,
  author  = {Arthur D Mangussi and Filipe Loyola Lopes and Miriam Seone Santos and Ricardo Cardoso Pereira and Ana Carolina Lorena and Pedro Henriques Abreu},
  title   = {mdatagen: A Python Library for the Generation of Artificial Missing Data},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {24},
  pages   = {1--6},
}

Acknowledgements

The authors gratefully acknowledge the Brazilian funding agencies FAPESP (Fundação Amparo à Pesquisa do Estado de São Paulo) under grants 2022/10553-6, and 2021/06870-3. Moreover, this research was supported in part by the Coordenação de Aperfeiçoamento de Pessoalde Nível Superior - Brasil (CAPES) - Finance Code 001, and Portuguese Recovery and Resilience Plan (PRR) through project C645008882-00000055 Center for Responsable AI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdatagen-0.0.87.tar.gz (14.2 kB view hashes)

Uploaded Source

Built Distribution

mdatagen-0.0.87-py3-none-any.whl (20.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page