Skip to main content

Package for Evaluation of Synthetic Tabular Data Quality

Project description

Synthetic-Eval

Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.

1. Installation

Install using pip:

pip install synthetic-eval

2. Supported Metrics

  • Statistical Fidelity
    1. KL-Divergence (KL)
    2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (GoF)
    3. Maximum Mean Discrepancy (MMD)
    4. Cramer-Wold Distance (CW)
    5. (naive) $\alpha$-precision & $\beta$-recall (alpha_precision, beta_recall)
  • Machine Learning Utility (classification task)
    1. Accuracy (base_cls, syn_cls)
    2. Model Selection Performance (model_selection)
    3. Feature Selection Performance (feature_selection)
  • Privacy Preservation
    1. $k$-Anonymization (Kanon_base, Kanon_syn)
    2. $k$-Map (KMap)
    3. Distance to Closest Record (DCR_RS, DCR_RR, DCR_SS)
    4. Attribute Disclosure (AD)

3. Usage

from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality

Example

  • Please ensure that the target column for the machine learning utility is the last column of the dataset.
"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""specify column types"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard',
    'Personal Loan', 
]
target = 'Personal Loan' # machine learning utility target column

### the target column should be the last column
data = data[continuous_features + [x for x in categorical_features if x != target] + [target]] 

"""training, test, synthetic datasets"""
data[categorical_features] = data[categorical_features].apply(
        lambda col: col.astype('category').cat.codes) 

train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_eval-0.1.4.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_eval-0.1.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_eval-0.1.4.tar.gz.

File metadata

  • Download URL: synthetic_eval-0.1.4.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.4.tar.gz
Algorithm Hash digest
SHA256 07b00b1f1d2467e7777669d0ba1c9d3b03d1b93e385fa47e8a00b6e4db725645
MD5 e75bf66492e6d1059f887d9141b7301a
BLAKE2b-256 c5a594670a6171f92f86905d96dffa75d16a695ac841049bc0ea4f5a9e0f0a7a

See more details on using hashes here.

File details

Details for the file synthetic_eval-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: synthetic_eval-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 126d64d68b4989b885834c137903d5824686275e9873138c076d7a7bfe84ba3b
MD5 abf802978172a64d709b6b83093f64fa
BLAKE2b-256 672877798917ce24408187f1c565407e52f68def52d85d458e9147f9f1a38b48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page