Skip to main content

Conditional microdata synthesis using normalizing flows

Project description

micro

Conditional microdata synthesis using normalizing flows.

PyPI Tests Docs

Overview

micro synthesizes survey microdata while preserving:

  • Conditional relationships: Generate target variables given demographics
  • Zero-inflated distributions: Handle variables that are 0 for many observations
  • Joint correlations: Preserve relationships between target variables
  • Hierarchical structures: Keep household/firm compositions intact

Installation

pip install micro

Quick Start

from micro import Synthesizer
import pandas as pd

# Load training data with known target variables
training_data = pd.read_csv("survey_with_income.csv")

# Initialize synthesizer
synth = Synthesizer(
    target_vars=["income", "expenditure", "savings"],
    condition_vars=["age", "education", "region"],
)

# Fit on training data
synth.fit(training_data, weight_col="weight", epochs=100)

# Generate synthetic targets for new demographics
new_demographics = pd.read_csv("demographics_only.csv")
synthetic = synth.generate(new_demographics)

Why micro?

Feature micro CT-GAN TVAE synthpop
Conditional generation
Zero-inflation handling ⚠️
Exact likelihood N/A
Stable training ⚠️
Preserves source structure ⚠️

Use Cases

  • Survey enhancement: Impute income variables from tax data onto census demographics
  • Privacy-preserving synthesis: Generate synthetic data that preserves statistical properties without copying real records
  • Data fusion: Combine variables from multiple surveys with different sample designs
  • Missing data imputation: Fill in missing values conditioned on observed variables

Architecture

┌─────────────────────────────────────────────────────────┐
│                      Synthesizer                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Training:                                               │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Training │───▶│ Transformer  │───▶│ Normalizing  │  │
│  │   Data   │    │ (log, std)   │    │    Flow      │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
│  Generation:                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Context  │───▶│ Zero + Flow  │───▶│  Inverse     │  │
│  │  Vars    │    │   Sampling   │    │  Transform   │  │
│  └──────────┘    └──────────────┘    └──────────────┘  │
│                                                          │
└─────────────────────────────────────────────────────────┘

Documentation

Full documentation at cosilicoai.github.io/micro

Benchmarks

See benchmarks/ for comparisons against:

  • CT-GAN: Conditional Tabular GAN (from SDV)
  • TVAE: Tabular VAE (from SDV)
  • Copulas: Gaussian copula synthesis (from SDV)
  • synthpop: CART-based synthesis (R package, via rpy2)

Citation

@software{micro2024,
  author = {Cosilico},
  title = {micro: Conditional microdata synthesis using normalizing flows},
  year = {2024},
  url = {https://github.com/CosilicoAI/micro}
}

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microsynth-0.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microsynth-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file microsynth-0.1.0.tar.gz.

File metadata

  • Download URL: microsynth-0.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for microsynth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1f4f8c312ec24aee26aa6e6deebe06e0217934572b8dfe757fc9486bcc8ff9bd
MD5 cdf88f20c8f93e0f3f160c34ce750206
BLAKE2b-256 6c5d0b2bfdaf8b8920a5da8af799e133a450003e8862c439181060b5f5b6049d

See more details on using hashes here.

File details

Details for the file microsynth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: microsynth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for microsynth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a3983270ebf76d524c8b4613712337f337404a7132a474e3eff9a872bc2cd22
MD5 5fede3a3e6c25747832681ec210e552b
BLAKE2b-256 58b82f3ca105a41d1e05cfe340db620ece98f2f79b0374db8a66ec21028fd6fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page