Microdata synthesis and reweighting using normalizing flows
Project description
microplex
Microdata synthesis and reweighting using normalizing flows.
Overview
microplex creates rich, calibrated microdata through:
- Conditional relationships: Generate target variables given demographics
- Zero-inflated distributions: Handle variables that are 0 for many observations
- Joint correlations: Preserve relationships between target variables
- Hierarchical structures: Keep household/firm compositions intact
Installation
pip install microplex
Quick Start
from microplex import Synthesizer
import pandas as pd
# Load training data with known target variables
training_data = pd.read_csv("survey_with_income.csv")
# Initialize synthesizer
synth = Synthesizer(
target_vars=["income", "expenditure", "savings"],
condition_vars=["age", "education", "region"],
)
# Fit on training data
synth.fit(training_data, weight_col="weight", epochs=100)
# Generate synthetic targets for new demographics
new_demographics = pd.read_csv("demographics_only.csv")
synthetic = synth.generate(new_demographics)
Why microplex?
| Feature | microplex | CT-GAN | TVAE | synthpop |
|---|---|---|---|---|
| Conditional generation | ✅ | ❌ | ❌ | ❌ |
| Zero-inflation handling | ✅ | ❌ | ❌ | ⚠️ |
| Exact likelihood | ✅ | ❌ | ❌ | N/A |
| Stable training | ✅ | ⚠️ | ✅ | ✅ |
| Preserves source structure | ✅ | ❌ | ❌ | ⚠️ |
Use Cases
- Survey enhancement: Impute income variables from tax data onto census demographics
- Privacy-preserving synthesis: Generate synthetic data that preserves statistical properties without copying real records
- Data fusion: Combine variables from multiple surveys with different sample designs
- Missing data imputation: Fill in missing values conditioned on observed variables
Architecture
┌─────────────────────────────────────────────────────────┐
│ Synthesizer │
├─────────────────────────────────────────────────────────┤
│ │
│ Training: │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Training │───▶│ Transformer │───▶│ Normalizing │ │
│ │ Data │ │ (log, std) │ │ Flow │ │
│ └──────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Generation: │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Context │───▶│ Zero + Flow │───▶│ Inverse │ │
│ │ Vars │ │ Sampling │ │ Transform │ │
│ └──────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Documentation
Full documentation at cosilicoai.github.io/microplex
Benchmarks
See benchmarks/ for comparisons against:
- CT-GAN: Conditional Tabular GAN (from SDV)
- TVAE: Tabular VAE (from SDV)
- Copulas: Gaussian copula synthesis (from SDV)
- synthpop: CART-based synthesis (R package, via rpy2)
Citation
@software{microplex2024,
author = {Cosilico},
title = {microplex: Microdata synthesis and reweighting using normalizing flows},
year = {2024},
url = {https://github.com/CosilicoAI/microplex}
}
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
microplex-0.1.0.tar.gz
(21.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
microplex-0.1.0-py3-none-any.whl
(15.5 kB
view details)
File details
Details for the file microplex-0.1.0.tar.gz.
File metadata
- Download URL: microplex-0.1.0.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28977e894dd0e500545b6529dfd2b2362493a8ab207aee70370715369631f429
|
|
| MD5 |
74737fbf54f505217ff6d875a401269b
|
|
| BLAKE2b-256 |
4a216eab643b40f500905fddc7ebdf4c8e3715af5d65697e7cac8ea86f1520c1
|
File details
Details for the file microplex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: microplex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b2113afc1c891eb1cd30b170d4fa7b4b1eb5c05659ced82820004f74e917b39
|
|
| MD5 |
b1cfb198e6f49d03addb4f768f9c7e61
|
|
| BLAKE2b-256 |
76c49774b31a70aa332a0411fcbd19034364270a7e0f0867b73c07236cedabe4
|