Skip to main content

A library for evaluation & visualization of synthetic data.

Project description

Syndat

tests docs version

Syndat is a software package that provides basic functionalities for the evaluation and visualizsation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

Installation

Install via pip:

pip install syndat

Usage

Quality metrics

Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, distribution similarity or pairwise feature correlations:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# How similar are the statistical distributions of real and synthetic features 
distribution_similarity_score = syndat.scores.distribution(real, synthetic)

# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)

# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)

Postprocessing

Postprocess synthetic data to improve data fidelity:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndat-0.10.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

syndat-0.10.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file syndat-0.10.1.tar.gz.

File metadata

  • Download URL: syndat-0.10.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.1.tar.gz
Algorithm Hash digest
SHA256 8f5e442a540b6f3c6c949777e6d1cee4c938b3c0098e1ae2e98ac3aa306139ed
MD5 487d06b3da1b6dd583ab31930d518041
BLAKE2b-256 999be97143555e223e6d183abc637e111b697999f37471b6f669eaa7473c8c6a

See more details on using hashes here.

File details

Details for the file syndat-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: syndat-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for syndat-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b5aa2e788a55548bcba5189bb66061b9fb67bafa737107e7315a3453a6253840
MD5 3450b0a963728fce25ae4d9aef712fe3
BLAKE2b-256 baa09e93e5c594c5f2ba76ab21e1bc6bbe4143219660b3b29f8da21a0f624c16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page