Skip to main content

A library for evaluation & visualization of synthetic data.

Project description

Syndat

tests docs version

Syndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

Installation

Install via pip:

pip install syndat

Usage

Quality metrics

Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, distribution similarity or pairwise feature correlations:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# How similar are the statistical distributions of real and synthetic features 
distribution_similarity_score = syndat.scores.distribution(real, synthetic)

# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)

# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)

Postprocessing

Postprocess synthetic data to improve data fidelity:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndat-0.10.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

syndat-0.10.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file syndat-0.10.0.tar.gz.

File metadata

  • Download URL: syndat-0.10.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.0.tar.gz
Algorithm Hash digest
SHA256 eab442d765ac5d62050d9403181d54f803b77398a8eaee4c2b003b120d4ec1c5
MD5 0604b04c231ece2e6fb6f2ac0c2a9db2
BLAKE2b-256 8f485efafc1b1262ea0d4e172f4afbba3d8f1396a61ff6a6257653feb1698fdc

See more details on using hashes here.

File details

Details for the file syndat-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: syndat-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4cc0bd233d2677053225614789363b1012f6886f804b1894d4c33df53b23dfa7
MD5 f19e3ce32476ce9dc73bec9f843d5045
BLAKE2b-256 e1026206735cdae8be3649437ec3d46f453710b551a8f681a6b6e472d6bf2669

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page