Evaluation of how good a synthetic dataset is compared to the original with presuppossing structural constraints
Project description
MAP-alignment fidelity and dataset distance for synthetic tabular data
This package implements the one-sided MAP-alignment fidelity statistic introduced by Chattopadhyay et al. and described in the manuscript “How Good Is Your Synthetic Data?”.
The core idea
For a synthetic record to be realistic, each coordinate should agree with the conditional MAP prediction inferred from real data.
Formally, for a data record x and coordinate i:
υ(x, i) = φ_i(x_i | x_{-i}) / max_y φ_i(y | x_{-i})
Averaged over samples and coordinates:
Υ(D) in [0,1]
High Υ: synthetic preserves real conditional structure
Low Υ: structural distortion (even if marginals / covariance match)
Installation
pip install lsynth
Quick Example
import pandas as pd
from lsynth import compute_upsilon
df_real = pd.read_csv("gss_2018.csv").sample(200)
ups_lsm, syn_lsm = compute_upsilon(
num=100,
model_path="gss_2018.joblib",
generate=True,
gen_algorithm="LSM",
orig_df=df_real,
n_workers=8,
)
print("LSM mean Upsilon:", ups_lsm.mean())
Interpretation
~1.0: synthetic matches conditional structure closely
~0.7: Gaussian-like distortions
< 0.7: strong structural mismatch
Why MAP-alignment?
Because covariance matching is insufficient.
Section VII of the manuscript gives explicit examples where:
Real and synthetic share identical means, variances, covariance matrices
Yet they differ strongly in conditional structure
MAP-alignment catches the discrepancy immediately
This method:
Detects nonlinear and higher-order structure
Avoids feature-embedding artifacts
Comes with finite-sample uncertainty control
Supported Generators
"LSM": use QuasiNet as a generative model via qsample
"BASELINE": independent-column null model
"CTGAN": uses SDV CTGAN synthesizer
Custom generators also supported
Citation
Chattopadhyay I, et al.
"How Good Is Your Synthetic Data?"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file lsynth-0.1.13.tar.gz.
File metadata
- Download URL: lsynth-0.1.13.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
489eecce521c75e0445e7ef2ad49bbf28e44cd369d3f183fbb886beecd2b64b2
|
|
| MD5 |
748bb60292a46cd7eab72e26f648c3d1
|
|
| BLAKE2b-256 |
adb0e40ec16f290a79176f493a29540ecb3368950c386065f46205d726422da3
|