Skip to main content

Microdata synthesis and reweighting using normalizing flows

Project description

microplex

Microdata synthesis and reweighting using normalizing flows.

PyPI Tests Docs

Overview

microplex creates rich, calibrated microdata through:

  • Conditional relationships: Generate target variables given demographics
  • Zero-inflated distributions: Handle variables that are 0 for many observations
  • Joint correlations: Preserve relationships between target variables
  • Hierarchical structures: Keep household/firm compositions intact

Installation

pip install microplex

Quick Start

from microplex import Synthesizer
import pandas as pd

# Load training data with known target variables
training_data = pd.read_csv("survey_with_income.csv")

# Initialize synthesizer
synth = Synthesizer(
    target_vars=["income", "expenditure", "savings"],
    condition_vars=["age", "education", "region"],
)

# Fit on training data
synth.fit(training_data, weight_col="weight", epochs=100)

# Generate synthetic targets for new demographics
new_demographics = pd.read_csv("demographics_only.csv")
synthetic = synth.generate(new_demographics)

Why microplex?

Feature microplex CT-GAN TVAE synthpop
Conditional generation
Zero-inflation handling ⚠️
Exact likelihood N/A
Stable training ⚠️
Preserves source structure ⚠️

Use Cases

  • Survey enhancement: Impute income variables from tax data onto census demographics
  • Privacy-preserving synthesis: Generate synthetic data that preserves statistical properties without copying real records
  • Data fusion: Combine variables from multiple surveys with different sample designs
  • Missing data imputation: Fill in missing values conditioned on observed variables

Architecture

┌─────────────────────────────────────────────────────────┐
│                      Synthesizer                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Training:                                               │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Training │───▶│ Transformer  │───▶│ Normalizing  │  │
│  │   Data   │    │ (log, std)   │    │    Flow      │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
│  Generation:                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Context  │───▶│ Zero + Flow  │───▶│  Inverse     │  │
│  │  Vars    │    │   Sampling   │    │  Transform   │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
└─────────────────────────────────────────────────────────┘

Documentation

Full documentation at cosilicoai.github.io/microplex

Benchmarks

See benchmarks/ for comparisons against:

  • CT-GAN: Conditional Tabular GAN (from SDV)
  • TVAE: Tabular VAE (from SDV)
  • Copulas: Gaussian copula synthesis (from SDV)
  • synthpop: CART-based synthesis (R package, via rpy2)

Citation

@software{microplex2024,
  author = {Cosilico},
  title = {microplex: Microdata synthesis and reweighting using normalizing flows},
  year = {2024},
  url = {https://github.com/CosilicoAI/microplex}
}

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microplex-0.1.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microplex-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file microplex-0.1.0.tar.gz.

File metadata

  • Download URL: microplex-0.1.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for microplex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 28977e894dd0e500545b6529dfd2b2362493a8ab207aee70370715369631f429
MD5 74737fbf54f505217ff6d875a401269b
BLAKE2b-256 4a216eab643b40f500905fddc7ebdf4c8e3715af5d65697e7cac8ea86f1520c1

See more details on using hashes here.

File details

Details for the file microplex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: microplex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for microplex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b2113afc1c891eb1cd30b170d4fa7b4b1eb5c05659ced82820004f74e917b39
MD5 b1cfb198e6f49d03addb4f768f9c7e61
BLAKE2b-256 76c49774b31a70aa332a0411fcbd19034364270a7e0f0867b73c07236cedabe4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page