A library for evaluation & visualization of synthetic data.

These details have not been verified by PyPI

Project links

Project description

tests docs version

About

Syndat is a software package that provides basic functionalities for the evaluation and visualisation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

Syndat also allows users to generate stratified and interpretable visualisations, including raincloud plots, GOF plots, and trajectory comparisons, offering deeper insights into the quality of synthetic clinical data across different subgroups.

Installation

Install via pip:

pip install syndat

Usage

Fidelity metrics

Jenson-Shannon Distance

The Jenson-Shannon distance is a measure of similarity between two probability distributions. In our case, we compute probability distributions for each feature in the datasets and compute and can thus compare the statistic feature similarity of two dataframes.

It is bounded between 0 and 1, with 0 indicating identical distributions.

(Normalized) Correlation Difference

In addition to statistical similarity between the same features, we also want to make sure to preserve the correlations across different features. The normalized correlation difference measures the similarity of the correlation matrix of two dataframes.

A low correlation difference near zero indicates that the correlation structure of the synthetic data is similar to the real data.

Discriminator AUC

A classifier is trained to discriminate between real and synthetic data. Based on the Receiver Operating Characteristic (ROC) curve, we compute the area under the curve (AUC) as a measure of how well the classifier can distinguish between the two datasets.

An AUC of 0.5 indicates that the classifier is unable to distinguish between the two datasets, while an AUC of 1.0 indicates perfect discrimination.

Exemplary usage:

import pandas as pd
from syndat.metrics import (
    jensen_shannon_distance,
    normalized_correlation_difference,
    discriminator_auc
)

real = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5],
    'feature2': ['A', 'B', 'A', 'B', 'C']
})

synthetic = pd.DataFrame({
    'feature1': [1, 2, 2, 3, 3],
    'feature2': ['A', 'B', 'A', 'C', 'C']
})

print(jensen_shannon_distance(real, synthetic))
>> {'feature1': 0.4990215421876156, 'feature2': 0.22141025172133794}

print(normalized_correlation_difference(real, synthetic))
>> 0.24571345029108108

print(discriminator_auc(real, synthetic))
>> 0.6

Scoring Functions

For convenience and easier interpretation, a normalized score can be computed for each of the metrics instead:

# JSD score is being aggregated over all features
distribution_similarity_score = syndat.scores.distribution(real, synthetic)
discrimination_score = syndat.scores.discrimination(real, synthetic)
correlation_score = syndat.scores.correlation(real, synthetic)

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)

Postprocessing

Postprocess synthetic data to improve data fidelity:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)

Evaluation and Visualization of Synthetic Clinical Trial Data

An example demonstrating how to compute distribution, discrimination, and correlation scores, as well as how to generate stratified visualizations (gof, raincloud and other plots), is available in examples/rct_example.py.

Acknowledgements

This work was done as part of the NFDI4Health Consortium.

It is currently also being extended as part of the SYNTHIA collaboration.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.13.8

Mar 9, 2026

0.13.7

Mar 6, 2026

0.13.6

Nov 20, 2025

0.13.5

Oct 14, 2025

0.13.4

Sep 15, 2025

0.13.3

Sep 8, 2025

0.13.2

Sep 4, 2025

0.13.1

Aug 28, 2025

0.13.0

Aug 18, 2025

0.12.3

Jul 30, 2025

0.12.2

Jul 16, 2025

0.12.1

Jul 2, 2025

0.12.0

Jun 10, 2025

0.11.0

Mar 17, 2025

0.10.5

Feb 6, 2025

0.10.4

Jan 25, 2025

0.10.3

Jan 8, 2025

0.10.2

Nov 11, 2024

0.10.1

Sep 30, 2024

0.10.0

Sep 27, 2024

0.9.1

Aug 12, 2024

0.9.0

Aug 2, 2024

0.0.8

May 23, 2024

0.0.7

May 22, 2024

0.0.6

May 21, 2024

0.0.5

May 10, 2024

0.0.4

May 6, 2024

0.0.3

Apr 24, 2024

0.0.2

Jan 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndat-0.13.8.tar.gz (27.5 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

syndat-0.13.8-py3-none-any.whl (29.7 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file syndat-0.13.8.tar.gz.

File metadata

Download URL: syndat-0.13.8.tar.gz
Upload date: Mar 9, 2026
Size: 27.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.14.0-1017-azure

File hashes

Hashes for syndat-0.13.8.tar.gz
Algorithm	Hash digest
SHA256	`bdce235c7ee05de102f72190a0061847335a825c0ce246f92e85790b40689871`
MD5	`4d9cc69c2aa402a76efaa1b72ece7148`
BLAKE2b-256	`5f8d03f22b9bbcc9f3135841bedac48aefa629b579381a0ca040f4248bf8d54b`

See more details on using hashes here.

File details

Details for the file syndat-0.13.8-py3-none-any.whl.

File metadata

Download URL: syndat-0.13.8-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 29.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.14.0-1017-azure

File hashes

Hashes for syndat-0.13.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b37fbc007949ca14cd74ea345b7af3a375f3dc1c0d4da7ede5b47cc248940599`
MD5	`b02c13d36f49bbf9ba12bbf25c0cc017`
BLAKE2b-256	`ef47af041089b18fe2862874cff2903378bfd4c78e244c850d4e027b29160491`

See more details on using hashes here.

syndat 0.13.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

Installation

Usage

Fidelity metrics

Jenson-Shannon Distance

(Normalized) Correlation Difference

Discriminator AUC

Scoring Functions

Visualization

Postprocessing

Evaluation and Visualization of Synthetic Clinical Trial Data

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes