Skip to main content

Tools to impute

Project description

hlbotterman@quantmetry.com, jroussel@quantmetry.com, tmorzadec@quantmetry.com, rhajou@quantmetry.com, fdakhli@quantmetry.com

License: new BSD Project-URL: Bug Tracker, https://github.com/Quantmetry/qolmat Project-URL: Documentation, https://qolmat.readthedocs.io/en/latest/ Project-URL: Source Code, https://github.com/Quantmetry/qolmat Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved Classifier: Topic :: Software Development Classifier: Topic :: Scientific/Engineering Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Operating System :: MacOS Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Requires-Python: >=3.8 Description-Content-Type: text/x-rst Provides-Extra: tests Provides-Extra: docs Provides-Extra: tensorflow License-File: LICENSE

GitHubActions ReadTheDocs License PythonVersion PyPi Release Commits

Welcome to Qolmat’s documentation!

The Qolmat package is created for the implementation and comparison of imputation methods. It can be divided into two main parts:

  1. Impute missing values via multiple algorithms;

  2. Compare the imputation methods in a supervised manner.

1 - Imputation methods

For univariate time series:

  • ImputerMean / ImputerMedian / ImputerMode : Replaces missing entries with the mean, median or mode of each column. It uses pd.DataFrame.fillna().

  • ImputerSuffle : Replaces missing entries with the random value of each column.

  • ImputerLOCF / ImputerNOCB : Replaces missing entries by carrying the last observation forward/ next observation backward, for each columns.

  • ImputerInterpolation: Replaces missing using some interpolation strategies supported by pd.Series.interpolate.

  • ImputerResiduals: Imputes values by a residuals methods. The series are de-seasonalised, residuals are imputed, then residuals are re-seasonalised.

  • ImputerRPCA: Imputes values via a RPCA method.

For multivariate time series:

  • ImputerKNN : Replaces missing entries with the k-nearest neighbors. It uses the sklearn.impute.KNNImputer.

  • ImputerIterative : Imputes each Series within a DataFrame multiple times using an iteration of fits and transformations to reach a stable state of imputation each time.It uses sklearn.impute.IterativeImputer

  • ImputerMICE : Imputes each Series within a DataFrame multiple times using an iteration of fits and transformations to reach a stable state of imputation each time. It uses sklearn.impute.IterativeImputer.

  • ImputerRegressor: It imputes each Series with missing value within a DataFrame using a regression model whose features are based on the complete ones only.

  • ImputeRPCA: Imputes values via a RPCA method.

  • ImputerEM: Imputation of missing values using a multivariate Gaussian model through EM optimization and using a projected (Ornstein-Uhlenbeck) process.

2 - Comparator

The Comparator class implements a way to compare multiple imputation methods. It is based on the standard approach to select some observations, set their status to missing, and compare their imputation with their true values.

More specifically, from the initial dataframe with missing value, we generate additional missing values (N samples/times). Missing values can be generated following the MCAR mechanism.

  • In the MCAR setting, each value is masked according to the realisation of a Bernoulli random variable with a fixed parameter.

On each sample, different imputation models are tested and reconstruction errors are computed on these artificially missing entries. Then the errors of each imputation model are averaged and we eventually obtained a unique error score per model. This procedure allows the comparison of different models on the same dataset.

docs/images/comparator.png

3 - Installation

🔗 Requirements

Python 3.8+

🛠 Installation

Installation for conda user
cconda env create -f environment.dev.yml
conda activate env_qolmat_dev
Install pre-commit

Once the environment is installed, pre-commit is installed, but need to be activated using the following command:

pre-commit install

📝 Contributing

This work is under development. And a lot of changes will still be made.

🔍 Further reading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qolmat-0.0.12.tar.gz (50.9 kB view hashes)

Uploaded Source

Built Distribution

qolmat-0.0.12-py3-none-any.whl (60.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page