Skip to main content

A library for evaluation & visualization of synthetic data.

Project description

Syndat

tests docs version

Syndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

Installation

Install via pip:

pip install syndat

Usage

Quality metrics

Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, distribution similarity or pairwise feature correlations:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# How similar are the statistical distributions of real and synthetic features 
distribution_similarity_score = syndat.scores.distribution(real, synthetic)

# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)

# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)

Postprocessing

Postprocess synthetic data to improve data fidelity:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndat-0.10.2.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

syndat-0.10.2-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file syndat-0.10.2.tar.gz.

File metadata

  • Download URL: syndat-0.10.2.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.2.tar.gz
Algorithm Hash digest
SHA256 77dd867196ba9fc118d542843fd7016144ad9000c648f1e7d2f03968f23e2dc2
MD5 073f72c99907a5a28ecbff889d01578e
BLAKE2b-256 f1bb0eb243ff9774f27794b27faf7d4dbfd92578234e579b4839cd157cb25fc5

See more details on using hashes here.

File details

Details for the file syndat-0.10.2-py3-none-any.whl.

File metadata

  • Download URL: syndat-0.10.2-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7941e23678454df05f8f76440e0e8310ab2b24a64a0fc3b952c25619e684701f
MD5 e7c9a1dfecceaeefcfa5f50a6722a93b
BLAKE2b-256 c8c3682a8b4c1abd78c3b16f7a1199bf10ba64395b15f255c878313408298e64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page