Skip to main content

Benchmarking imputation methods for microdata

Project description

Microimpute

Microimpute is a Python package for imputing variables from one survey dataset onto another. It wraps five imputation methods behind a common interface so you can benchmark them on your data and pick the one that works best, rather than defaulting to a single approach.

Methods

  • Statistical Matching: distance-based matching to find similar donor observations
  • Ordinary Least Squares (OLS): linear regression imputation
  • Quantile Regression: models conditional quantiles instead of the conditional mean
  • Quantile Random Forests (QRF): non-parametric, tree-based quantile estimation
  • Mixture Density Networks (MDN): neural network with a Gaussian mixture output

Autoimpute

The autoimpute function tunes hyperparameters, runs cross-validation across all five methods, and selects the best performer based on quantile loss (for numerical targets) or log loss (for categorical targets). It handles numerical, categorical, and boolean variables.

API

All models follow a fit() / predict() interface. The package supports sample weights to account for survey design, and validates inputs automatically. Adding a custom imputation method is straightforward since new models just need to implement the same interface.

Documentation and paper

  • Documentation with examples and interactive notebooks
  • Paper presenting microimpute and demonstrating it for SCF-to-CPS net worth imputation

Dashboard

An interactive dashboard for exploring imputation results is available at https://microimpute-dashboard.vercel.app/. It supports file upload, URL loading, direct GitHub artifact integration, and sample data.

Installation

pip install microimpute

For image export (PNG/JPG):

pip install microimpute[images]

Contributing

Pull requests are welcome. If you find a bug or have a feature idea, open an issue or submit a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microimpute-1.15.1.tar.gz (128.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microimpute-1.15.1-py3-none-any.whl (111.3 kB view details)

Uploaded Python 3

File details

Details for the file microimpute-1.15.1.tar.gz.

File metadata

  • Download URL: microimpute-1.15.1.tar.gz
  • Upload date:
  • Size: 128.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for microimpute-1.15.1.tar.gz
Algorithm Hash digest
SHA256 af409525d475efeb8c8526e9630834c4f16563e15cd42665117d2a1397fcf404
MD5 9c22e6a1f6d0641f9dd82e51425b08e4
BLAKE2b-256 9717d621d4ed40e0afac6f1a2c4dea423783576613820d1460ae30d65c48309e

See more details on using hashes here.

File details

Details for the file microimpute-1.15.1-py3-none-any.whl.

File metadata

  • Download URL: microimpute-1.15.1-py3-none-any.whl
  • Upload date:
  • Size: 111.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for microimpute-1.15.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f5f2de91eeedea28ddae42d42757b558d6eb85c1a1fd6a9097b53e309f19369c
MD5 bbf2ad59eb5f9fb7dd9c648e00eec2f1
BLAKE2b-256 42f11d80dbb8cc9e85962524a4233cfe42ac1a78e6f2cc0ca479ed1817f6d8ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page