Skip to main content

Package for Evaluation of Synthetic Tabular Data Quality

Project description

Synthetic-Eval

Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.

1. Installation

Install using pip:

pip install synthetic-eval

2. Supported Metrics

  • Statistical Fidelity
    1. KL-Divergence (KL)
    2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (GoF)
    3. Maximum Mean Discrepancy (MMD)
    4. Cramer-Wold Distance (CW)
    5. (naive) $\alpha$-precision & $\beta$-recall (alpha_precision, beta_recall)
  • Machine Learning Utility (classification task)
    1. Accuracy (base_cls, syn_cls)
    2. Model Selection Performance (model_selection)
    3. Feature Selection Performance (feature_selection)
  • Privacy Preservation
    1. $k$-Anonymization (Kanon_base, Kanon_syn)
    2. $k$-Map (KMap)
    3. Distance to Closest Record (DCR_RS, DCR_RR, DCR_SS)
    4. Attribute Disclosure (AD)

3. Usage

from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality

Example

  • Please ensure that the target column for the machine learning utility is the last column of the dataset.
"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""specify column types"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard',
    'Personal Loan', 
]
target = 'Personal Loan' # machine learning utility target column

### the target column should be the last column
data = data[continuous_features + [x for x in categorical_features if x != target] + [target]] 

"""training, test, synthetic datasets"""
data[categorical_features] = data[categorical_features].apply(
        lambda col: col.astype('category').cat.codes) 

train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_eval-0.1.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_eval-0.1.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_eval-0.1.1.tar.gz.

File metadata

  • Download URL: synthetic_eval-0.1.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1b3f81ff91f2b65e37079b330e4c2a0b2cd242a9458332c327f58263c7adf771
MD5 ee69e23cca1878b9865f6a67e80d4a90
BLAKE2b-256 963f4b88bc09e813863ca1c2b2a3482e60c8cded69fd3a84a9a9af8b4dc1f715

See more details on using hashes here.

File details

Details for the file synthetic_eval-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: synthetic_eval-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a6fc66e947fa603aab6062b32705b5bc8272ec8ef4be77c2d188e2ce53ccd8e
MD5 4a87a7d81de59f7458e7a5126eedff54
BLAKE2b-256 9985ce7d9e3402ff09b0eceb78172769ae5c97595e101c33100d5792a7726798

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page