Multiple Imputation with Denoising Autoencoders

These details have not been verified by PyPI

Project links

Project description

MIDASpy

Deprecation notice

MIDASpy is deprecated. Please use midasverse-midas, which replaces MIDASpy with a faster PyTorch-based backend, a simpler sklearn-style API (no manual preprocessing), and fewer dependencies (no TensorFlow). MIDASpy will remain on PyPI for existing users but will not receive new features or bug fixes.

A migration guide is available below: Migrating to midasverse-midas.

Install the replacement:
pip install midasverse-midas

Overview

MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant accuracy and efficiency advantages over other multiple imputation strategies, particularly when applied to large datasets with complex features. In addition to implementing the algorithm, the package contains functions for processing data before and after model training, running imputation model diagnostics, generating multiple completed datasets, and estimating regression models on these datasets.

For an implementation in R, see rMIDAS2.

Background and suggested citations

For more information on MIDAS, the method underlying the software, see:

Lall, Ranjit, and Thomas Robinson. 2022. "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." Political Analysis 30, no. 2: 179-196. doi:10.1017/pan.2020.49. Published version. Accepted version.

Lall, Ranjit, and Thomas Robinson. 2023. "Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS." Journal of Statistical Software 107, no. 9: 1-38. doi:10.18637/jss.v107.i09. Published version.

Installation

To install via pip, enter the following command into the terminal: pip install MIDASpy

The latest development version (potentially unstable) can be installed via the terminal with: pip install git+https://github.com/MIDASverse/MIDASpy.git

MIDAS requires:

Python (>=3.6; <3.11)
Numpy (>=1.5, <=1.26.4)
Pandas (>=0.19)
TensorFlow (<2.12)
Matplotlib
Statmodels
Scipy
TensorFlow Addons (<0.20)

Tensorflow also has a number of requirements, particularly if GPU acceleration is desired. See https://www.tensorflow.org/install/ for details.

Examples

For a simple demonstration of MIDASpy, see our Jupyter Notebook examples.

Migrating to midasverse-midas

Why midasverse-midas?

	MIDASpy	midasverse-midas
Backend	TensorFlow 1.x / 2.x	PyTorch
Preprocessing	Manual (`binary_conv()`, `cat_conv()`, column sorting)	Automatic column-type detection
API style	Separate init / `build_model()` / `train_model()` / `generate_samples()`	sklearn-style `fit()` / `transform()` / `fit_transform()`
Python versions	3.6--3.10	3.9+
TensorFlow required	Yes	No

Installation

pip install midasverse-midas

Side-by-side comparison

1. Preprocessing

MIDASpy required manual conversion of binary and categorical columns before building the model:

# --- MIDASpy ---
from MIDASpy import Midas, binary_conv, cat_conv

df['income'] = binary_conv(df['income'])
cat_encoded, cat_cols = cat_conv(df[['workclass', 'marital_status']])
df = pd.concat([df.drop(['workclass', 'marital_status'], axis=1), cat_encoded], axis=1)

midasverse-midas detects column types automatically:

# --- midasverse-midas ---
from midas2 import MIDAS

# No preprocessing needed -- just pass your DataFrame directly

2. Model construction and training

MIDASpy required three separate steps -- instantiate, build, train:

# --- MIDASpy ---
imputer = Midas(
    layer_structure=[256, 256, 256],
    learn_rate=0.0004,
    input_drop=0.8,
    train_batch=16,
    seed=42,
)
imputer.build_model(
    imputation_target=df,
    binary_columns=['income'],
    softmax_columns=cat_cols,
)
imputer.train_model(training_epochs=20)

midasverse-midas combines these into a single fit() call:

# --- midasverse-midas ---
mod = MIDAS(hidden_layers=[256, 128, 64], dropout_prob=0.5)
mod.fit(df, epochs=20, lr=0.001, corrupt_rate=0.8, seed=42)

Parameter name changes:

MIDASpy (`__init__` / `build_model` / `train_model`)	midasverse-midas (`__init__` / `fit`)	Notes
`layer_structure`	`hidden_layers`	Default changed from [256,256,256] to [256,128,64]
`learn_rate`	`lr`	Default changed from 0.0004 to 0.001
`input_drop`	`corrupt_rate`	Moved to `fit()`
`train_batch`	`batch_size`	Default changed from 16 to 64
`dropout_level`	`dropout_prob`	Moved to `__init__()`
`cont_adj`	`num_adj`	Moved to `fit()`
`softmax_adj`	`cat_adj`	Moved to `fit()`
`binary_adj`	`bin_adj`	Moved to `fit()`
`training_epochs`	`epochs`	Moved to `fit()`
`binary_columns` / `softmax_columns`	Automatic	No manual specification needed

3. Generating imputations

MIDASpy used generate_samples() to store imputations in an attribute:

# --- MIDASpy ---
imputer.generate_samples(m=10)
completed_datasets = imputer.output_list

midasverse-midas uses transform(), which returns a generator:

# --- midasverse-midas ---
imputations = list(mod.transform(m=10))

Or use fit_transform() for an all-in-one approach:

# --- midasverse-midas ---
imputations = list(mod.fit_transform(df, m=10, epochs=20))

4. Rubin's rules regression

MIDASpy combine() took separate y_var and X_vars arguments along with a list of DataFrames:

# --- MIDASpy ---
from MIDASpy import combine
results = combine(
    y_var='income',
    X_vars=['age', 'hours_per_week'],
    df_list=imputer.output_list,
)

midasverse-midas combine() takes dfs, y, and optional ind_vars (defaults to all non-outcome columns):

# --- midasverse-midas ---
from midas2 import combine
results = combine(imputations, y='income', ind_vars=['age', 'hours_per_week'])

# Or use all predictors:
results = combine(imputations, y='income')

5. Mean imputation (new)

midasverse-midas adds an imp_mean() utility:

# --- midasverse-midas only ---
from midas2 import imp_mean
mean_imputed = imp_mean(mod.transform(m=10), pandas=True)

Complete migration example

MIDASpy (old)

from MIDASpy import Midas, binary_conv, cat_conv, combine

# 1. Preprocess
df['income'] = binary_conv(df['income'])
cat_encoded, cat_cols = cat_conv(df[['workclass', 'marital_status']])
df_processed = pd.concat([df.drop(['workclass', 'marital_status'], axis=1), cat_encoded], axis=1)

# 2. Build and train
imputer = Midas(layer_structure=[256, 256, 256], seed=42)
imputer.build_model(df_processed, binary_columns=['income'], softmax_columns=cat_cols)
imputer.train_model(training_epochs=20)

# 3. Generate imputations
imputer.generate_samples(m=5)

# 4. Analyse
combine(y_var='income', X_vars=['age', 'hours_per_week'], df_list=imputer.output_list)

midasverse-midas (new)

from midas2 import MIDAS, combine

# 1. Fit (no preprocessing needed)
mod = MIDAS()
mod.fit(df, epochs=20, seed=42)

# 2. Impute
imputations = list(mod.transform(m=5))

# 3. Analyse
combine(imputations, y='income', ind_vars=['age', 'hours_per_week'])

Quick-reference cheat sheet

Task	MIDASpy	midasverse-midas
Import	`from MIDASpy import Midas`	`from midas2 import MIDAS`
Preprocess binary	`binary_conv(col)`	Automatic
Preprocess categorical	`cat_conv(df[cols])`	Automatic
Instantiate	`Midas(layer_structure, ...)`	`MIDAS(hidden_layers, ...)`
Build model	`imputer.build_model(df, binary_columns, softmax_columns)`	Not needed
Train	`imputer.train_model(training_epochs)`	`mod.fit(df, epochs, ...)`
Generate imputations	`imputer.generate_samples(m)`	`mod.transform(m)`
All-in-one	Not available	`mod.fit_transform(df, m, ...)`
Access imputations	`imputer.output_list`	`list(mod.transform(m))`
Mean imputation	Not available	`imp_mean(imputations)`
Rubin's rules	`combine(y_var, X_vars, df_list)`	`combine(dfs, y, ind_vars)`

Version history

Version 1.4.1 (August 2024)

Adds support for non-negative output columns, with a positive_columns argument

Version 1.3.1 (October 2023)

Minor update to reflect publication of accompanying article in Journal of Statistical Software
Further updates to make documentation and URLs consistent, including removing unused metadata

Version 1.2.4 (August 2023)

Adds support for Python 3.9 and 3.10
Addresses deprecation warnings and other minor bug fixes
Resolves dependency issues and includes an updated setup.py file
Adds GitHub Actions workflows that trigger automatic tests on the latest Ubuntu, macOS, and Windows for Python versions 3.7 to 3.10 each time a push or pull request is made to the main branch
An additional Jupyter Notebook example that demonstrates the core functionalities of MIDASpy

Version 1.2.3 (December 2022)

v1.2.3 adds support for installation on Apple Silicon hardware (i.e. M1 and M2 Macs).

Version 1.2.2 (July 2022)

v1.2.2 makes minor efficiency changes to the codebase. Full details are available in the Release logs.

Version 1.2.1 (January 2021)

v1.2.1 adds new pre-processing functionality and a multiple imputation regression function.

Users can now automatically preprocess binary and categorical columns prior to running the MIDAS algorithm using binary_conv() and cat_conv().

The new combine() function allows users to run regression analysis across the complete data, following Rubin's combination rules.

Previous versions

Version 1.1.1 (October 2020)

Key changes:

Update adds full Tensorflow 2.X support:
- Users can now run the MIDAS algorithm in TensorFlow 2.X (TF1 support retained)
- Tidier handling of random seed setting across both TensorFlow and NumPy
Fixes a minor dependency bug
Other minor bug fixes

Version 1.0.2 (September 2020)

Key changes:

Minor, mainly cosmetic, changes to the underlying source code.
Renamed 'categorical_columns' argument in build_model() to 'binary_columns' to avoid confusion
Added plotting arguments to overimputation() method to suppress intermediary overimputation plots (plot_main) and all plots (skip_plot).
Changed overimputation() plot titles, labels and legends
Added tensorflow 2.0 version check on import
Fixed seed-setting bug in earlier versions

Alpha 0.2:

Variational autoencoder enabled. More flexibility in model specification, although defaulting to a simple mirrored system. Deeper analysis tools within .overimpute() for checking fit on continuous values. Constructor code deconflicted. Individual output specification enabled for very large datasets.

Key added features:

Variational autoencoder capacity added, including encoding to and sampling from latent space

Alpha 0.1:

Basic functionality feature-complete.
Support for mixed categorical and continuous data types
An "additional data" pipeline, allowing data that may be relevant to the imputation to be included (without being included in error generating statistics)
Simplified calibration for model complexity through the "overimputation" function, including visualization of reconstructed features
Basic large dataset functionality

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.4.2

Mar 8, 2026

1.4.1

Aug 31, 2024

1.4.0 yanked

Aug 30, 2024

1.3.1

Oct 16, 2023

1.3.0

Oct 11, 2023

1.2.5

Oct 11, 2023

1.2.4

Aug 22, 2023

1.2.3

Dec 23, 2022

1.2.2

Jul 24, 2022

1.2.1

Jan 20, 2021

1.2.0

Jan 20, 2021

1.1.1

Nov 9, 2020

1.1.0

Oct 30, 2020

1.0.2

Sep 21, 2020

1.0.1 yanked

Sep 21, 2020

1.0.1.dev1 pre-release

Sep 21, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midaspy-1.4.2.tar.gz (30.6 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

midaspy-1.4.2-py3-none-any.whl (30.2 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file midaspy-1.4.2.tar.gz.

File metadata

Download URL: midaspy-1.4.2.tar.gz
Upload date: Mar 8, 2026
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for midaspy-1.4.2.tar.gz
Algorithm	Hash digest
SHA256	`9e2147d0e4e71a2f86d44f248f9aab51e627966446e504dfdead16c23a69a876`
MD5	`6526158b954a55566cd38e8e507db5b7`
BLAKE2b-256	`96b766140e746df08fa89e24e9c67cb4d5cdc52a0c44008273feb33e272560f7`

See more details on using hashes here.

File details

Details for the file midaspy-1.4.2-py3-none-any.whl.

File metadata

Download URL: midaspy-1.4.2-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 30.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for midaspy-1.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a78ec362a0b2b3267d1c787564b5938d6834c7ac8466eab4621f14634d17049`
MD5	`cb29521beb9071d2d8d6a78e6711fd56`
BLAKE2b-256	`94a27f8e3b41e98f5aee5e198999c867a9d8edd43c5b9e2e0d00afc3b2c89f3b`

See more details on using hashes here.

midaspy 1.4.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MIDASpy

Overview

Background and suggested citations

Installation

Examples

Migrating to midasverse-midas

Why midasverse-midas?

Installation

Side-by-side comparison

1. Preprocessing

2. Model construction and training

3. Generating imputations

4. Rubin's rules regression

5. Mean imputation (new)

Complete migration example

MIDASpy (old)

midasverse-midas (new)

Quick-reference cheat sheet

Version history

Version 1.4.1 (August 2024)

Version 1.3.1 (October 2023)

Version 1.2.4 (August 2023)

Version 1.2.3 (December 2022)

Version 1.2.2 (July 2022)

Version 1.2.1 (January 2021)

Previous versions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes