Skip to main content

Package for Evaluation of Synthetic Tabular Data Quality

Project description

Synthetic-Eval

Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.

1. Installation

Install using pip:

pip install synthetic-eval

2. Supported Metrics

  • Statistical Fidelity
    1. KL-Divergence (KL)
    2. Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (GoF)
    3. Maximum Mean Discrepancy (MMD)
    4. Cramer-Wold Distance (CW)
    5. (naive) $\alpha$-precision & $\beta$-recall (alpha_precision, beta_recall)
  • Machine Learning Utility (classification task)
    1. Accuracy (base_cls, syn_cls)
    2. Model Selection Performance (model_selection)
    3. Feature Selection Performance (feature_selection)
  • Privacy Preservation
    1. $k$-Anonymization (Kanon_base, Kanon_syn)
    2. $k$-Map (KMap)
    3. Distance to Closest Record (DCR_RS, DCR_RR, DCR_SS)
    4. Attribute Disclosure (AD)

3. Usage

from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality

Example

  • Please ensure that the target column for the machine learning utility is the last column of the dataset.
"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""specify column types"""
data = pd.read_csv('./loan.csv') 
# len(data) # 5,000

"""specify column types"""
continuous_features = [
    'Age',
    'Experience',
    'Income', 
    'CCAvg',
    'Mortgage',
]
categorical_features = [
    'Family',
    'Securities Account',
    'CD Account',
    'Online',
    'CreditCard',
    'Personal Loan', 
]
target = 'Personal Loan' # machine learning utility target column

### the target column should be the last column
data = data[continuous_features + [x for x in categorical_features if x != target] + [target]] 

"""training, test, synthetic datasets"""
data[categorical_features] = data[categorical_features].apply(
        lambda col: col.astype('category').cat.codes) 

train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]

"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
    syndata, train, test, 
    target, continuous_features, categorical_features, device
)

"""print results"""
for x, y in results._asdict().items():
    print(f"{x}: {y:.3f}")

3. References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_eval-0.1.2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_eval-0.1.2-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_eval-0.1.2.tar.gz.

File metadata

  • Download URL: synthetic_eval-0.1.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ecb454bad0f94b46dddd456b91b18f1ce8d9df8b1811485af0c5ed885ee4c8fc
MD5 4b3f8825a392293588b9668e76f813bf
BLAKE2b-256 861176abff405e8b0783960ef4ed96e20044c1d6be97da3f614004e2de627d22

See more details on using hashes here.

File details

Details for the file synthetic_eval-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: synthetic_eval-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for synthetic_eval-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aa060aa55f4a7b6403dc52cd0c369944f6ee508a514fded939acb950646bcdd4
MD5 295852c1a905a3b7ec6e9f47b1c3ba9a
BLAKE2b-256 5f8835c2bd6685161e8a9b558ac5ddb928ae20bf94c0ff50121d5637f5835919

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page