Skip to main content

Package for Evaluation of Synthetic Tabular Data Quality

Project description

Synthetic-Eval

Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.

1. Installation

Install using pip:

pip install synthetic-eval

2. Supported Metrics

  • Statistical Fidelity
    1. KL-Divergence (KL)
    2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (GoF)
    3. Maximum Mean Discrepancy (MMD)
    4. Cramer-Wold Distance (CW)
    5. (naive) $\alpha$-precision & $\beta$-recall (alpha_precision, beta_recall)
  • Machine Learning Utility (classification task)
    1. Accuracy (base_cls, syn_cls)
    2. Model Selection Performance (model_selection)
    3. Feature Selection Performance (feature_selection)
  • Privacy Preservation
    1. $k$-Anonymization (Kanon_base, Kanon_syn)
    2. $k$-Map (KMap)
    3. Distance to Closest Record (DCR_RS, DCR_RR, DCR_SS)
    4. Attribute Disclosure (AD)

3. Usage

from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality

Example

"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""the ground-truth (training, test) and synthetic dataset"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000
train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Personal Loan',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard'
]
target = 'Personal Loan' # machine learning utility target column

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_eval-0.0.8.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_eval-0.0.8-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_eval-0.0.8.tar.gz.

File metadata

  • Download URL: synthetic_eval-0.0.8.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.0.8.tar.gz
Algorithm Hash digest
SHA256 034ccc2544da687cb0387e90ae9540a7fc9e679f60192833a36cf0e0ffcaa17c
MD5 2843e744215816f0bb9e4fc41311edf9
BLAKE2b-256 0a9a7e6c1b779d1fedabb28810521c2c65232085db7e6e7197519baee9717f1b

See more details on using hashes here.

File details

Details for the file synthetic_eval-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: synthetic_eval-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 103c8881e627c19406213ccbfd085a8ccef37a1fb48e5d86d67abed676f5a1b1
MD5 b39079f0baa9f71d0c3a41791f8d17cf
BLAKE2b-256 d9de246e2d4d5ad98a31a2c778f4811fd6cdada1eaeaa0893342b31d90b107c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page