Benchmarking imputation methods for microdata
Project description
Microimpute
Microimpute is a Python package for imputing variables from one survey dataset onto another. It wraps five imputation methods behind a common interface so you can benchmark them on your data and pick the one that works best, rather than defaulting to a single approach.
Methods
- Statistical Matching: distance-based matching to find similar donor observations
- Ordinary Least Squares (OLS): linear regression imputation
- Quantile Regression: models conditional quantiles instead of the conditional mean
- Quantile Regression Forests (QRF): non-parametric, tree-based quantile estimation
- Mixture Density Networks (MDN): neural network with a Gaussian mixture output
Autoimpute
The autoimpute function tunes hyperparameters, runs cross-validation across all five methods, and selects the best performer based on quantile loss (for numerical targets) or log loss (for categorical targets). It handles numerical, categorical, and boolean variables.
API
All models follow a fit() / predict() interface. The package supports sample weights to account for survey design, and validates inputs automatically. Adding a custom imputation method is straightforward since new models just need to implement the same interface.
Documentation and paper
- Documentation with examples and interactive notebooks
- Paper presenting microimpute and demonstrating it for SCF-to-CPS net worth imputation
Dashboard
An interactive dashboard for exploring imputation results is available at https://microimpute-dashboard.vercel.app/. It supports file upload, URL loading, direct GitHub artifact integration, and sample data.
Installation
pip install microimpute
For image export (PNG/JPG):
pip install microimpute[images]
Contributing
Pull requests are welcome. If you find a bug or have a feature idea, open an issue or submit a PR.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file microimpute-2.0.4.tar.gz.
File metadata
- Download URL: microimpute-2.0.4.tar.gz
- Upload date:
- Size: 144.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85b43f941af4f99903371d77dc4434ecdf7533d6f9e789783654c69eed6fea9c
|
|
| MD5 |
5fdf948b42680075bb4ae5a2863e5c86
|
|
| BLAKE2b-256 |
9a9d50d97ab8eea569caeda5e132b771c30cf961d36d7f72014997ee53f7c183
|
File details
Details for the file microimpute-2.0.4-py3-none-any.whl.
File metadata
- Download URL: microimpute-2.0.4-py3-none-any.whl
- Upload date:
- Size: 125.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6787d0738468ef374d2414edaf1777603274f6cbfb7b51a9de8b831c8ede1c68
|
|
| MD5 |
9dbfe82e0cae35a1711df950fd57ceb9
|
|
| BLAKE2b-256 |
5acf3ae796573e81ace825916a94b73f1e1d7887bf6264bda368aba3f0bddff8
|