Package for Evaluation of Synthetic Tabular Data Quality
Project description
Synthetic-Eval
Synthetic-Eval is a package for the comprehensive evaluation of synthetic tabular datasets.
1. Installation
Install using pip:
pip install synthetic-eval
2. Supported Metrics
- Statistical Fidelity
- KL-Divergence (
KL) - Goodness-of-Fit (Kolmogorov-Smirnov test & Chi-Squared test) (
GoF) - Maximum Mean Discrepancy (
MMD) - Cramer-Wold Distance (
CW) - (naive) $\alpha$-precision & $\beta$-recall (
alpha_precision,beta_recall)
- KL-Divergence (
- Machine Learning Utility (classification task)
- Accuracy (
base_cls,syn_cls) - Model Selection Performance (
model_selection) - Feature Selection Performance (
feature_selection)
- Accuracy (
- Privacy Preservation
- $k$-Anonymization (
Kanon_base,Kanon_syn) - $k$-Map (
KMap) - Distance to Closest Record (
DCR_RS,DCR_RR,DCR_SS) - Attribute Disclosure (
AD)
- $k$-Anonymization (
3. Usage
from synthetic_eval import evaluation
evaluation.evaluate # function for evaluating synthetic data quality
- See example.ipynb for detailed example and its results with
loandataset.- Link for download
loandataset: https://www.kaggle.com/datasets/teertha/personal-loan-modeling
- Link for download
Example
- Please ensure that the target column for the machine learning utility is the last column of the dataset.
"""import libraries"""
import pandas as pd
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
"""specify column types"""
data = pd.read_csv('./loan.csv')
# len(data) # 5,000
"""specify column types"""
continuous_features = [
'Age',
'Experience',
'Income',
'CCAvg',
'Mortgage',
]
categorical_features = [
'Family',
'Securities Account',
'CD Account',
'Online',
'CreditCard',
'Personal Loan',
]
target = 'Personal Loan' # machine learning utility target column
### the target column should be the last column
data = data[continuous_features + [x for x in categorical_features if x != target] + [target]]
"""training, test, synthetic datasets"""
data[categorical_features] = data[categorical_features].apply(
lambda col: col.astype('category').cat.codes)
train = data.iloc[:2000]
test = data.iloc[2000:4000]
syndata = data.iloc[4000:]
"""load Synthetic-Eval"""
from synthetic_eval import evaluation
results = evaluation.evaluate(
syndata, train, test,
target, continuous_features, categorical_features, device
)
"""print results"""
for x, y in results._asdict().items():
print(f"{x}: {y:.3f}")
3. References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
synthetic_eval-0.1.3.tar.gz
(11.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthetic_eval-0.1.3.tar.gz.
File metadata
- Download URL: synthetic_eval-0.1.3.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd95c09f785dfa2c6cb725fc9734253d9e2be46d64493249614870735100de8d
|
|
| MD5 |
001c5a03b59981957f69dddcb780df4f
|
|
| BLAKE2b-256 |
210b09252f830b61ce6e117192a8a57141d661f19d63253688409baa432039bf
|
File details
Details for the file synthetic_eval-0.1.3-py3-none-any.whl.
File metadata
- Download URL: synthetic_eval-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2f3a6e1a8f71105bf694e24f25cf33cbfde7a482b3e2a3fa6daa5b758fb4d61
|
|
| MD5 |
06fcbc684e09fee37415c4481b7fb1e4
|
|
| BLAKE2b-256 |
6c638cbb2d0f0688c2002ed68a87aac0d486d4cffe40c3ff5021e7567b6a2856
|