a python package used for missing data imputation via autoencoders
Project description
ae-imputer
ae-imputer is a python package used for missing data imputation via autoencoders.
As of now, only numerical values are supported for imputation.
The method used is based on the paper:
Installing
Note that ae-imputer uses PyTorch for all of its underlying AutoEncoder implementations.
Requirements:
- Python 3.8 or greater
- numpy
- scikit-learn
- pytorch
pip install ae-imputer
Usage
The ae-imputer package is designed to match sklearn imputers calling API.
import numpy as np
from aeimputer import AEImputer
X = [[1,2,3],[2,np.nan,4],[np.nan,5,6],[np.nan,2,3],[2,3,4],[4,5,6]]
imputer = AEImputer(n_layers=5)
X_imputed = imputer.fit_transform(X)
It is recommended to normalize your data before fitting and imputation. Unlike the example above, AEImputer is meant to be used with much larger amounts of data, in order to properly utilyze its capabilities.
There are a number of parameters that can be set for the AEImputer class; the major ones are as follows:
-
model_type
: 'variational' or 'vanilla', default='variational; Type of AutoEncoder architecture to use. -
n_layers
: int, default=3 The number of layers in the AutoEncoder network.hidden_dims
: list of int, default=None The number of neurons for each hidden layer in the AutoEncoder network. If None, will be determined automatically.`hidden_dims`` : list of int, default=None The number of neurons for each hidden layer in the AutoEncoder network. If None, will be determined automatically.preimpute_at_train
: bool, default = False AEImputer uses only complete rows of data during fitting by default. If set True the missing values will be imputed with 'preimpute_strategy' before training. Advised, if the fraction of missing rows is significantmax_epochs
: int, default=1000 The maximum number of epochs to train the AutoEncoder.lr
: float, default=1e-3 The learning rate for the optimizer during training.
@article{MCCOY2018141,
title = {Variational Autoencoders for Missing Data Imputation with Application to a Simulated Milling Circuit},
journal = {IFAC-PapersOnLine},
volume = {51},
number = {21},
pages = {141-146},
year = {2018},
note = {5th IFAC Workshop on Mining, Mineral and Metal Processing MMM 2018},
issn = {2405-8963},
doi = {https://doi.org/10.1016/j.ifacol.2018.09.406},
url = {https://www.sciencedirect.com/science/article/pii/S2405896318320949},
author = {John T. McCoy and Steve Kroon and Lidia Auret},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ae_imputer-0.0.1.post2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 711158608193e7b4a6fff65b6ebaf9b3ee3a968f60afe54fe16b3ad9a226d82a |
|
MD5 | 9de851936df1e15876269d5adfa53768 |
|
BLAKE2b-256 | 2aa0ac8d71f699b6eccc341c5fe77648ab8a8e6fed33dbc9600d62339d0e45a8 |