SyntheRela - Synthetic Relational Data Generation Benchmark
Project description
SyntheRela - Synthetic Relational Data Generation Benchmark
About SyntheRela
SyntheRela is a comprehensive benchmark designed to evaluate and compare synthetic relational database generation methods. It provides a standardized framework for assessing both the fidelity and utility of synthetic data across multiple real-world databases. The benchmark includes novel evaluation metrics, particularly for relational data, and supports various open-source and commercial synthetic data generation methods.
SyntheRela is highly extensible, allowing users to benchmark on their own custom datasets and implement new evaluation metrics to suit specific use cases.
Our research on SyntheRela is presented in the paper "SyntheRela: A Benchmark For Synthetic Relational Database Generation" at the ICLR 2025 Workshop "Will Synthetic Data Finally Solve the Data Access Problem?", available on OpenReview.
We maintain a public leaderboard on Hugging Face where you can compare the performance of different synthetic data generation methods.
Installation
To install only the benchmark package, run the following command:
pip install syntherela
Replicating the paper's results
For detailed instructions on how to replicate the paper's results, please refer to docs/REPLICATING_RESULTS.md.
Adding a new metric
The documentation for adding a new metric can be found in docs/ADDING_A_METRIC.md.
Synthetic Data Methods
Open Source Methods
- SDV: The Synthetic Data Vault
- RCTGAN: Row Conditional-TGAN for Generating Synthetic Relational Databases
- REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers
- ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
- IRG: Generating Synthetic Relational Databases using GANs
- RGCLD: Relational Data Generation with Graph Neural Networks and Latent Diffusion Models
- Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders*
- Generative Modeling of Complex Data*
- BayesM2M & NeuralM2M: Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation*
* Denotes the method does not have a public implementation available.
Commercial Providers
A list of commercial synthetic relational data providers is available in docs/SYNTHETIC_DATA_TOOLS.md.
Conflicts of Interest
The authors declare no conflict of interest and are not associated with any of the evaluated commercial synthetic data providers.
Citation
If you use SyntheRela in your work, please cite our paper:
@inproceedings{
iclrsyntheticdata2025syntherela,
title={SyntheRela: A Benchmark For Synthetic Relational Database Generation},
author={Martin Jurkovic and Valter Hudovernik and Erik {\v{S}}trumbelj},
booktitle={Will Synthetic Data Finally Solve the Data Access Problem?},
year={2025},
url={https://openreview.net/forum?id=ZfQofWYn6n}
}
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file syntherela-0.0.4.tar.gz
.
File metadata
- Download URL: syntherela-0.0.4.tar.gz
- Upload date:
- Size: 304.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5577a8ae4ed813aafe0e18b2ec20f5072306cd089d57b65b35dae0a3b77639e7
|
|
MD5 |
eec25f60859161b3f4248181600d0f9c
|
|
BLAKE2b-256 |
83a77a8d75192d0cda69d179a025b7b741dacf0543553f1f12698215732cf7b8
|
File details
Details for the file syntherela-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: syntherela-0.0.4-py3-none-any.whl
- Upload date:
- Size: 413.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
38f9b37bd39238c5bde290c5bb4de72fba6461872bf3f27c2236d30436c8e441
|
|
MD5 |
1e8f7847d9e72f3fdc37135383546d59
|
|
BLAKE2b-256 |
e92b04b19412a0c6ff5052d1fe5d851e36e35aeef485a93817a7551db63ebac2
|