A toolkit for generating and evaluating synthetic data in terms of utility, privacy, and similarity
Project description
Synthetic Data Generation Toolkit
This repository provides a comprehensive toolkit for generating synthetic data using seven different models. The toolkit evaluates the generated data for utility, similarity/fidelity, and privacy, specifically tailored for tabular datasets with binary classification problems (e.g., True/False, Yes/No).
Models Included
The project implements the following models for synthetic data generation:
- CopulaGAN
- CTGAN
- Gaussian Copula
- TVAE
- Gaussian Multivariate
- WGAN
- ARF
Quick Start
Step 1: Install the Package
Install the package using pip:
pip install synthius
Step 2: Usage Example
To understand how to use this package, explore the three example Jupyter notebooks included in the repository:
-
- Demonstrates how to generate synthetic data using seven different models.
- Update paths and configurations (e.g., file paths, target column) to fit your dataset.
- Run the cells to generate synthetic datasets.
-
- Evaluates the utility.
- Update the paths as needed to analyze your data.
-
- Provides examples of computing metrics for evaluating synthetic data, including:
- Utility
- Fidelity/Similarity
- Privacy
- Update paths and dataset-specific configurations and run the cells to compute the results.
- Provides examples of computing metrics for evaluating synthetic data, including:
These notebooks serve as practical examples to demonstrate how to effectively utilize the toolkit.
Additional Setup for Mac Users
Mac users may encounter errors during installation. To resolve these issues, install the required dependencies and set up the environment:
-
Install dependencies using Homebrew:
brew install libomp llvm
-
Set up the environment:
export PATH="/opt/homebrew/opt/llvm/bin:$PATH" export CC=$(brew --prefix llvm)/bin/clang export CXX=$(brew --prefix llvm)/bin/clang++ export CXXFLAGS="-I$(brew --prefix llvm)/include -I$(brew --prefix libomp)/include" export LDFLAGS="-L$(brew --prefix llvm)/lib -L$(brew --prefix libomp)/lib -lomp"
Acknowledgments
Special thanks to all contributors and the libraries used in this project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthius-0.2.0.tar.gz.
File metadata
- Download URL: synthius-0.2.0.tar.gz
- Upload date:
- Size: 58.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adeee5f1cfde9e716265a547eb464bb34a8ea077e1286cfeb69051e41b605c08
|
|
| MD5 |
cef357dac3648cec7cfe7e7880588b6e
|
|
| BLAKE2b-256 |
802463f0587054883de10f0142c5a1050942c3f615a1f2e96919704655da790e
|
Provenance
The following attestation bundles were made for synthius-0.2.0.tar.gz:
Publisher:
publish.yml on calgo-lab/Synthius
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synthius-0.2.0.tar.gz -
Subject digest:
adeee5f1cfde9e716265a547eb464bb34a8ea077e1286cfeb69051e41b605c08 - Sigstore transparency entry: 164139919
- Sigstore integration time:
-
Permalink:
calgo-lab/Synthius@d0726a1f9b5396607b678789d309224b39096ab8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/calgo-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d0726a1f9b5396607b678789d309224b39096ab8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file synthius-0.2.0-py3-none-any.whl.
File metadata
- Download URL: synthius-0.2.0-py3-none-any.whl
- Upload date:
- Size: 77.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2909587b4ad12a91ae17acb12f9d8d4ae0c733e27ad3399f01c421f423b4cd40
|
|
| MD5 |
f52db77016826caeeab2adaf352d7de6
|
|
| BLAKE2b-256 |
1ce16f332f2dc50d4eccede86d82f86d8515dcb83b5173e5496a53912f62844e
|
Provenance
The following attestation bundles were made for synthius-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on calgo-lab/Synthius
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
synthius-0.2.0-py3-none-any.whl -
Subject digest:
2909587b4ad12a91ae17acb12f9d8d4ae0c733e27ad3399f01c421f423b4cd40 - Sigstore transparency entry: 164139922
- Sigstore integration time:
-
Permalink:
calgo-lab/Synthius@d0726a1f9b5396607b678789d309224b39096ab8 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/calgo-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d0726a1f9b5396607b678789d309224b39096ab8 -
Trigger Event:
release
-
Statement type: