Skip to main content

A library for evaluation & visualization of synthetic data.

Project description

Syndat

tests docs version

Syndat is a software package that provides basic functionalities for the evaluation and visualisation of synthetic data. Quality scores can be computed on 3 base metrics (Discrimation, Correlation and Distribution) and data may be visualized to inspect correlation structures or statistical distribution plots.

Installation

Install via pip:

pip install syndat

Usage

Quality metrics

Compute data quality metrics by comparing real and synthetic data in terms of their separation complexity, distribution similarity or pairwise feature correlations:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# How similar are the statistical distributions of real and synthetic features 
distribution_similarity_score = syndat.scores.distribution(real, synthetic)

# How hard is it for a classifier to discriminate real and synthetic data
discrimination_score = syndat.scores.discrimination(real, synthetic)

# How well are pairwise feature correlations preserved
correlation_score = syndat.scores.correlation(real, synthetic)

Scores are defined in a range of 0-100, with a higher score corresponding to better data fidelity.

Visualization

Visualize real vs. synthetic data distributions, summary statistics and discriminating features:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# plot *all* feature distribution and store image files
syndat.visualization.plot_distributions(real, synthetic, store_destination="results/plots")
syndat.visualization.plot_correlations(real, synthetic, store_destination="results/plots")

# plot and display specific feature distribution plot
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)
syndat.visualization.plot_numerical_feature("feature_xy", real, synthetic)

# plot a shap plot of differentiating feature for real and synthetic data
syndat.visualization.plot_shap_discrimination(real, synthetic)

Postprocessing

Postprocess synthetic data to improve data fidelity:

import pandas as pd
import syndat

real = pd.read_csv("real.csv")
synthetic = pd.read_csv("synthetic.csv")

# postprocess synthetic data
synthetic_post = syndat.postprocessing.assert_minmax(real, synthetic)
synthetic_post = syndat.postprocessing.normalize_float_precision(real, synthetic)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syndat-0.11.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syndat-0.11.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file syndat-0.11.0.tar.gz.

File metadata

  • Download URL: syndat-0.11.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for syndat-0.11.0.tar.gz
Algorithm Hash digest
SHA256 d95b9256dcd7f40f12ea52c9d1ec88b0914f5b68c75908dafbaeca6ccf77512b
MD5 0fe5b7ae758f4be675f6ccf3cb9e9a21
BLAKE2b-256 7bae5da54c62da65a3ceccb8f7e2283597c6865ea658a122dbf8266fd1e89213

See more details on using hashes here.

File details

Details for the file syndat-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: syndat-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for syndat-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9aa65e39253a62ac7004f5d29fbd94845714d834e1830d6cb9a66f4f837c58ba
MD5 b96ce8219f33ccf0aa77705f0e8f8348
BLAKE2b-256 b1169d0e0710a3cc59e7b20bc920259a182741c673cb9cb83d5be7a6530b7a80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page