A library for evaluation & visualization of synthetic data.
Project description
Syndat
Syndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.
Installation
Install via pip:
pip install syndat
Usage
Quality metrics
Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, distribution similarity or pairwise feature correlations:
import pandas as pd
import syndat
real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")
# How similar are the statistical distributions of real and synthetic features
distribution_similarity_score = syndat.scores.distribution(real, synthetic)
# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)
# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)
Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.
Visualization
Visualize real vs. synthetic data distributions, summary statistics and discriminating features:
import pandas as pd
import syndat
real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")
# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")
# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)
Postprocessing
Postprocess synthetic data to improve data fidelity:
import pandas as pd
import syndat
real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")
# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file syndat-0.10.2.tar.gz
.
File metadata
- Download URL: syndat-0.10.2.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77dd867196ba9fc118d542843fd7016144ad9000c648f1e7d2f03968f23e2dc2 |
|
MD5 | 073f72c99907a5a28ecbff889d01578e |
|
BLAKE2b-256 | f1bb0eb243ff9774f27794b27faf7d4dbfd92578234e579b4839cd157cb25fc5 |
File details
Details for the file syndat-0.10.2-py3-none-any.whl
.
File metadata
- Download URL: syndat-0.10.2-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7941e23678454df05f8f76440e0e8310ab2b24a64a0fc3b952c25619e684701f |
|
MD5 | e7c9a1dfecceaeefcfa5f50a6722a93b |
|
BLAKE2b-256 | c8c3682a8b4c1abd78c3b16f7a1199bf10ba64395b15f255c878313408298e64 |