Metrics for Synthetic Data Generation Projects

These details have not been verified by PyPI

Project links

Homepage

Project description

An Open Source Project from the Data to AI Lab, at MIT

Metrics for Synthetic Data Generation Projects

Website: https://sdv.dev
Documentation: https://sdv.dev/SDV
Repository: https://github.com/sdv-dev/SDMetrics
License: MIT
Development Status: Pre-Alpha

Overview

The SDMetrics library provides a set of dataset-agnostic tools for evaluating the quality of a synthetic database by comparing it to the real database that it is modeled after.

It supports multiple data modalities:

Single Columns: Compare 1 dimensional numpy arrays representing individual columns.
Column Pairs: Compare how columns in a pandas.DataFrame relate to each other, in groups of 2.
Single Table: Compare an entire table, represented as a pandas.DataFrame.
Multi Table: Compare multi-table and relational datasets represented as a python dict with multiple tables passed as pandas.DataFrames.
Time Series: Compare tables representing ordered sequences of events.

It includes a variety of metrics such as:

Statistical metrics which use statistical tests to compare the distributions of the real and synthetic distributions.
Detection metrics which use machine learning to try to distinguish between real and synthetic data.
Efficacy metrics which compare the performance of machine learning models when run on the synthetic and real data.
Bayesian Network and Gaussian Mixture metrics which learn the distribution of the real data and evaluate the likelihood of the synthetic data belonging to the learned distribution.
Privacy metrics which evaluate whether the synthetic data is leaking information about the real data.

Install

SDMetrics is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, SDMetrics can also be installed as a standalone library using the following commands:

Using pip:

pip install sdmetrics

Using conda:

conda install -c sdv-dev -c conda-forge -c pytorch sdmetrics

For more installation options please visit the SDMetrics installation Guide

Usage

SDMetrics is included as part of the framework offered by SDV to evaluate the quality of your synthetic dataset. For more details about how to use it please visit the corresponding User Guide:

Evaluating Synthetic Data

Standalone usage

SDMetrics can also be used as a standalone library to run metrics individually.

In this short example we show how to use it to evaluate a toy multi-table dataset and its synthetic replica by running all the compatible multi-table metrics on it:

import sdmetrics

# Load the demo data, which includes:
# - A dict containing the real tables as pandas.DataFrames.
# - A dict containing the synthetic clones of the real data.
# - A dict containing metadata about the tables.
real_data, synthetic_data, metadata = sdmetrics.load_demo()

# Obtain the list of multi table metrics, which is returned as a dict
# containing the metric names and the corresponding metric classes.
metrics = sdmetrics.multi_table.MultiTableMetric.get_subclasses()

# Run all the compatible metrics and get a report
sdmetrics.compute_metrics(metrics, real_data, synthetic_data, metadata=metadata)

The output will be a table with all the details about the executed metrics and their score:

metric	name	score	min_value	max_value	goal
CSTest	Chi-Squared	0.76651	0	1	MAXIMIZE
KSTest	Inverted Kolmogorov-Smirnov D statistic	0.75	0	1	MAXIMIZE
KSTestExtended	Inverted Kolmogorov-Smirnov D statistic	0.777778	0	1	MAXIMIZE
LogisticDetection	LogisticRegression Detection	0.882716	0	1	MAXIMIZE
SVCDetection	SVC Detection	0.833333	0	1	MAXIMIZE
BNLikelihood	BayesianNetwork Likelihood	nan	0	1	MAXIMIZE
BNLogLikelihood	BayesianNetwork Log Likelihood	nan	-inf	0	MAXIMIZE
LogisticParentChildDetection	LogisticRegression Detection	0.619444	0	1	MAXIMIZE
SVCParentChildDetection	SVC Detection	0.916667	0	1	MAXIMIZE

What's next?

If you want to read more about each individual metric, please visit the following folders:

Single Column Metrics: sdmetrics/single_column
Single Table Metrics: sdmetrics/single_table
Multi Table Metrics: sdmetrics/multi_table
Time Series Metrics: sdmetrics/timeseries

The Synthetic Data Vault

This repository is part of The Synthetic Data Vault Project

Website: https://sdv.dev
Documentation: https://sdv.dev/SDV

History

v0.3.0 - 2021-03-30

This release includes privacy metrics to evaluate if the real data could be obtained or deduced from the synthetic samples. Additionally all the metrics have a normalize method which takes the raw_score generated by the metric and returns a value between 0 and 1.

Issues closed

Add normalize method to metrics - Issue #51 by @csala and @fealho
Implement privacy metrics - Issue #36 by @ZhuofanXie and @fealho

v0.2.0 - 2021-02-24

Dependency upgrades to ensure compatibility with the rest of the SDV ecosystem.

v0.1.3 - 2021-02-13

Updates the required dependecies to facilitate a conda release.

Issues closed

Upgrade sktime - Issue #49 by @fealho

v0.1.2 - 2021-01-27

Big fixing release that addresses several minor errors.

Issues closed

More splits than classes - Issue #46 by @fealho
Scipy 1.6.0 causes an AttributeError - Issue #44 by @fealho
Time series metrics fails with variable length timeseries - Issue #42 by @fealho
ParentChildDetection metrics KeyError - Issue #39 by @csala

v0.1.1 - 2020-12-30

This version adds Time Series Detection and Efficacy metrics, as well as a fix to ensure that Single Table binary classification efficacy metrics work well with binary targets which are not boolean.

Issues closed

Timeseries efficacy metrics - Issue #35 by @csala
Timeseries detection metrics - Issue #34 by @csala
Ensure binary classification targets are bool - Issue #33 by @csala

v0.1.0 - 2020-12-18

This release introduces a new project organization and API, with metrics grouped by data modality, with a common API:

Single Column
Column Pair
Single Table
Multi Table
Time Series

Within each data modality, different families of metrics have been implemented:

Statistical
Detection
Bayesian Network and Gaussian Mixture Likelihood
Machine Learning Efficacy

v0.0.4 - 2020-11-27

Patch release to relax dependencies and avoid conflicts when using the latest SDV version.

v0.0.3 - 2020-11-20

Fix error on detection metrics when input data contains infinity or NaN values.

Issues closed

ValueError: Input contains infinity or a value too large for dtype('float64') - Issue #11 by @csala

v0.0.2 - 2020-08-08

Add support for Python 3.8 and a broader range of dependencies.

v0.0.1 - 2020-06-26

First release to PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.27.1

Feb 13, 2026

0.27.1.dev0 pre-release

Feb 13, 2026

0.27.0

Jan 29, 2026

0.26.1.dev0 pre-release

Jan 29, 2026

0.26.0

Jan 27, 2026

0.25.1.dev0 pre-release

Jan 23, 2026

0.25.0

Jan 8, 2026

0.24.1.dev0 pre-release

Jan 8, 2026

0.24.0

Nov 3, 2025

0.23.1.dev0 pre-release

Oct 30, 2025

0.23.0

Aug 14, 2025

0.22.1.dev0 pre-release

Aug 13, 2025

0.22.0

Jul 24, 2025

0.21.1.dev0 pre-release

Jul 24, 2025

0.21.0

May 29, 2025

0.21.0.dev0 pre-release

May 29, 2025

0.20.1

Apr 14, 2025

0.20.1.dev0 pre-release

Apr 14, 2025

0.20.0 yanked

Apr 11, 2025

Reason this release was yanked:

Imports crashed unless torch was installed

0.20.0.dev0 pre-release

Apr 10, 2025

0.19.0

Feb 25, 2025

0.19.0.dev0 pre-release

Feb 24, 2025

0.18.0

Dec 13, 2024

0.18.0.dev0 pre-release

Dec 13, 2024

0.17.1

Dec 4, 2024

0.17.1.dev0 pre-release

Dec 4, 2024

0.17.0

Nov 15, 2024

0.17.0.dev0 pre-release

Nov 14, 2024

0.16.0

Sep 25, 2024

0.16.0.dev0 pre-release

Sep 25, 2024

0.15.1

Aug 13, 2024

0.15.1.dev0 pre-release

Aug 13, 2024

0.15.0

Jul 15, 2024

0.15.0.dev0 pre-release

Jul 12, 2024

0.14.1

May 13, 2024

0.14.1.dev0 pre-release

May 13, 2024

0.14.0

Apr 11, 2024

0.14.0.dev0 pre-release

Apr 10, 2024

0.13.1

Mar 14, 2024

0.13.1.dev0 pre-release

Mar 14, 2024

0.13.0

Dec 4, 2023

0.13.0.dev0 pre-release

Nov 30, 2023

0.12.1

Nov 1, 2023

0.12.1.dev0 pre-release

Nov 1, 2023

0.12.0

Nov 1, 2023

0.12.0.dev0 pre-release

Oct 31, 2023

0.11.1

Sep 14, 2023

0.11.1.dev0 pre-release

Sep 14, 2023

0.11.0

Aug 10, 2023

0.11.0.dev0 pre-release

Aug 10, 2023

0.10.1

Jun 6, 2023

0.10.1.dev0 pre-release

Jun 5, 2023

0.10.0

May 4, 2023

0.10.0.dev2 pre-release

May 3, 2023

0.10.0.dev1 pre-release

May 3, 2023

0.10.0.dev0 pre-release

May 2, 2023

0.9.3

Apr 12, 2023

0.9.3.dev0 pre-release

Apr 11, 2023

0.9.2

Mar 8, 2023

0.9.2.dev0 pre-release

Mar 7, 2023

0.9.1

Feb 17, 2023

0.9.1.dev0 pre-release

Feb 16, 2023

0.9.0

Jan 18, 2023

0.9.0.dev0 pre-release

Jan 18, 2023

0.8.1

Dec 10, 2022

0.8.1.dev0 pre-release

Dec 8, 2022

0.8.0

Nov 2, 2022

0.8.0.dev0 pre-release

Nov 2, 2022

0.7.0

Sep 27, 2022

0.7.0.dev0 pre-release

Sep 27, 2022

0.6.0

Aug 12, 2022

0.6.0.dev1 pre-release

Aug 12, 2022

0.6.0.dev0 pre-release

Aug 12, 2022

0.5.1.dev0 pre-release

Jul 10, 2022

0.5.0

May 11, 2022

0.5.0.dev0 pre-release

May 11, 2022

0.4.2 yanked

May 10, 2022

Reason this release was yanked:

dependency conflict

0.4.2.dev0 pre-release

May 10, 2022

0.4.1

Dec 9, 2021

0.4.1.dev0 pre-release

Dec 9, 2021

0.4.0

Nov 16, 2021

0.4.0.dev0 pre-release

Nov 16, 2021

0.3.3.dev0 pre-release

Nov 5, 2021

0.3.2

Aug 17, 2021

0.3.2.dev1 pre-release

Aug 17, 2021

0.3.2.dev0 pre-release

Aug 17, 2021

0.3.1

Jul 12, 2021

0.3.1.dev1 pre-release

Jul 7, 2021

This version

0.3.1.dev0 pre-release

Jul 2, 2021

0.3.0

Mar 31, 2021

0.3.0.dev1 pre-release

Mar 31, 2021

0.3.0.dev0 pre-release

Mar 29, 2021

0.2.1.dev0 pre-release

Mar 29, 2021

0.2.0

Feb 24, 2021

0.2.0.dev0 pre-release

Feb 23, 2021

0.1.3

Feb 13, 2021

0.1.3.dev0 pre-release

Feb 13, 2021

0.1.2

Jan 27, 2021

0.1.2.dev2 pre-release

Jan 27, 2021

0.1.2.dev1 pre-release

Jan 27, 2021

0.1.2.dev0 pre-release

Jan 27, 2021

0.1.1

Dec 30, 2020

0.1.1.dev0 pre-release

Dec 29, 2020

0.1.0

Dec 18, 2020

0.1.0.dev2 pre-release

Dec 18, 2020

0.1.0.dev1 pre-release

Dec 18, 2020

0.1.0.dev0 pre-release

Dec 16, 2020

0.0.4

Nov 27, 2020

0.0.4.dev0 pre-release

Nov 27, 2020

0.0.3

Nov 20, 2020

0.0.3.dev1 pre-release

Nov 20, 2020

0.0.3.dev0 pre-release

Nov 20, 2020

0.0.2

Aug 8, 2020

0.0.2.dev1 pre-release

Aug 7, 2020

0.0.2.dev0 pre-release

Jul 9, 2020

0.0.1

Jun 26, 2020

0.0.1.dev2 pre-release

Jun 26, 2020

0.0.1.dev1 pre-release

Jun 25, 2020

0.0.1.dev0 pre-release

Jun 25, 2020

0.0.0

Mar 20, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdmetrics-0.3.1.dev0.tar.gz (198.6 kB view details)

Uploaded Jul 2, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sdmetrics-0.3.1.dev0-py2.py3-none-any.whl (95.7 kB view details)

Uploaded Jul 2, 2021 Python 2Python 3

File details

Details for the file sdmetrics-0.3.1.dev0.tar.gz.

File metadata

Download URL: sdmetrics-0.3.1.dev0.tar.gz
Upload date: Jul 2, 2021
Size: 198.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for sdmetrics-0.3.1.dev0.tar.gz
Algorithm	Hash digest
SHA256	`f1c7f731c3d123c87e98efc96ba4256662b4815a15dfbd591157e164c14203fb`
MD5	`fde3e3cb4a14921292b493021cb257df`
BLAKE2b-256	`18ad8f2c01ac5428771b74edf4063db4259328bea8d85023b9e75440992eeb93`

See more details on using hashes here.

File details

Details for the file sdmetrics-0.3.1.dev0-py2.py3-none-any.whl.

File metadata

Download URL: sdmetrics-0.3.1.dev0-py2.py3-none-any.whl
Upload date: Jul 2, 2021
Size: 95.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for sdmetrics-0.3.1.dev0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c17e540f71c67d8b32abf50ace84434ed878c8424e5776d3f55d537a97ac01bb`
MD5	`a3305950706efe35ca0537070f8cd2d6`
BLAKE2b-256	`2114b7ff2f9af45bbe95b26f6d2482650cbb15b944e09e6bb28c464275dfcdd1`

See more details on using hashes here.

sdmetrics 0.3.1.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Install

Usage

Standalone usage

What's next?

The Synthetic Data Vault

History

v0.3.0 - 2021-03-30

Issues closed

v0.2.0 - 2021-02-24

v0.1.3 - 2021-02-13

Issues closed

v0.1.2 - 2021-01-27

Issues closed

v0.1.1 - 2020-12-30

Issues closed

v0.1.0 - 2020-12-18

v0.0.4 - 2020-11-27

v0.0.3 - 2020-11-20

Issues closed

v0.0.2 - 2020-08-08

v0.0.1 - 2020-06-26

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes