Skip to main content

Package for Evaluation of Synthetic Tabular Data Quality

Project description

Synthetic-Eval

Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.

1. Installation

Install using pip:

pip install synthetic-eval

2. Supported Metrics

  • Statistical Fidelity
    1. KL-Divergence (KL)
    2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (GoF)
    3. Maximum Mean Discrepancy (MMD)
    4. Cramer-Wold Distance (CW)
    5. (naive) $\alpha$-precision & $\beta$-recall (alpha_precision, beta_recall)
  • Machine Learning Utility (classification task)
    1. Accuracy (base_cls, syn_cls)
    2. Model Selection Performance (model_selection)
    3. Feature Selection Performance (feature_selection)
  • Privacy Preservation
    1. $k$-Anonymization (Kanon_base, Kanon_syn)
    2. $k$-Map (KMap)
    3. Distance to Closest Record (DCR_RS, DCR_RR, DCR_SS)
    4. Attribute Disclosure (AD)

3. Usage

from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality

Example

"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""the ground-truth (training, test) and synthetic dataset"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000
train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Personal Loan',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard'
]
target = 'Personal Loan' # machine learning utility target column

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_eval-0.0.2.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_eval-0.0.2-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_eval-0.0.2.tar.gz.

File metadata

  • Download URL: synthetic_eval-0.0.2.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for synthetic_eval-0.0.2.tar.gz
Algorithm Hash digest
SHA256 459d1d79858c18a554e8c8be80be3ab028f314e60a926c7db3668a56c68b0bcf
MD5 ad612ee5252909011d807b6e2e5247f1
BLAKE2b-256 8712bdd66dd69a49f4d37f61a078570dc249e3b7c61aab69316e91e45b1189c4

See more details on using hashes here.

File details

Details for the file synthetic_eval-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: synthetic_eval-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for synthetic_eval-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e5f9b08d59c0442c4f53d579cc0de4fc66ba3c76083b9becca83d40f69cec300
MD5 ac8f8d62c13d3021f4bd04b4f6816a10
BLAKE2b-256 9f3f07a5bdff8929b5de6ac5286040d7326f3552660fa54761b96533452c9619

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page