Skip to main content

a python package used for missing data imputation via autoencoders

Project description

PythonNumPyPyTorchscikit-learn

ae-imputer

ae-imputer is a python package used for missing data imputation via autoencoders.

As of now, only numerical values are supported for imputation.

The method used is based on the paper:

John T. McCoy, Steve Kroon, Lidia Auret: Variational Autoencoders for Missing Data Imputation with Application to a Simulated Milling Circuit, IFAC-PapersOnLine, 2018

Installing

Note that ae-imputer uses PyTorch for all of its underlying AutoEncoder implementations.

Requirements:

  • Python 3.8 or greater
  • numpy
  • scikit-learn
  • pytorch
pip install ae-imputer

Usage

The ae-imputer package is designed to match sklearn imputers calling API.

import numpy as np
from aeimputer import AEImputer

X = [[1,2,3],[2,np.nan,4],[np.nan,5,6],[np.nan,2,3],[2,3,4],[4,5,6]]
imputer = AEImputer(n_layers=5)

X_imputed = imputer.fit_transform(X)

It is recommended to normalize your data before fitting and imputation. Unlike the example above, AEImputer is meant to be used with much larger amounts of data, in order to properly utilyze its capabilities.

There are a number of parameters that can be set for the AEImputer class; the major ones are as follows:

  • model_type : 'variational' or 'vanilla', default='variational; Type of AutoEncoder architecture to use.

  • n_layers : int, default=3 The number of layers in the AutoEncoder network.

    hidden_dims : list of int, default=None The number of neurons for each hidden layer in the AutoEncoder network. If None, will be determined automatically.`hidden_dims`` : list of int, default=None The number of neurons for each hidden layer in the AutoEncoder network. If None, will be determined automatically.

    preimpute_at_train : bool, default = False AEImputer uses only complete rows of data during fitting by default. If set True the missing values will be imputed with 'preimpute_strategy' before training. Advised, if the fraction of missing rows is significant

    max_epochs : int, default=1000 The maximum number of epochs to train the AutoEncoder.

    lr : float, default=1e-3 The learning rate for the optimizer during training.

@article{MCCOY2018141,
    title = {Variational Autoencoders for Missing Data Imputation with Application to a Simulated Milling Circuit},
    journal = {IFAC-PapersOnLine},
    volume = {51},
    number = {21},
    pages = {141-146},
    year = {2018},
    note = {5th IFAC Workshop on Mining, Mineral and Metal Processing MMM 2018},
    issn = {2405-8963},
    doi = {https://doi.org/10.1016/j.ifacol.2018.09.406},
    url = {https://www.sciencedirect.com/science/article/pii/S2405896318320949},
    author = {John T. McCoy and Steve Kroon and Lidia Auret},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ae-imputer-0.0.1.post2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

ae_imputer-0.0.1.post2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file ae-imputer-0.0.1.post2.tar.gz.

File metadata

  • Download URL: ae-imputer-0.0.1.post2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for ae-imputer-0.0.1.post2.tar.gz
Algorithm Hash digest
SHA256 ebe70dfe07873307fd753814cba792667aa80475f5cbafda7566bfd68c95f830
MD5 4892b39c3712cae1419389f495a57e5d
BLAKE2b-256 47176d501860db26c64c453af460def66d02c3bdde625e04153fb675d868fc1f

See more details on using hashes here.

File details

Details for the file ae_imputer-0.0.1.post2-py3-none-any.whl.

File metadata

  • Download URL: ae_imputer-0.0.1.post2-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for ae_imputer-0.0.1.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 711158608193e7b4a6fff65b6ebaf9b3ee3a968f60afe54fe16b3ad9a226d82a
MD5 9de851936df1e15876269d5adfa53768
BLAKE2b-256 2aa0ac8d71f699b6eccc341c5fe77648ab8a8e6fed33dbc9600d62339d0e45a8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page