Skip to main content

SyntheRela - Synthetic Relational Data Generation Benchmark

Project description

SyntheRela - Synthetic Relational Data Generation Benchmark

About SyntheRela

SyntheRela is a comprehensive benchmark designed to evaluate and compare synthetic relational database generation methods. It provides a standardized framework for assessing both the fidelity and utility of synthetic data across multiple real-world databases. The benchmark includes novel evaluation metrics, particularly for relational data, and supports various open-source and commercial synthetic data generation methods.

SyntheRela is highly extensible, allowing users to benchmark on their own custom datasets and implement new evaluation metrics to suit specific use cases.

Our research on SyntheRela is presented in the paper "SyntheRela: A Benchmark For Synthetic Relational Database Generation" at the ICLR 2025 Workshop "Will Synthetic Data Finally Solve the Data Access Problem?", available on OpenReview.

We maintain a public leaderboard on Hugging Face where you can compare the performance of different synthetic data generation methods.

Installation

To install only the benchmark package, run the following command:

pip install syntherela

Replicating the paper's results

For detailed instructions on how to replicate the paper's results, please refer to docs/REPLICATING_RESULTS.md.

Adding a new metric

The documentation for adding a new metric can be found in docs/ADDING_A_METRIC.md.

Synthetic Data Methods

Open Source Methods

* Denotes the method does not have a public implementation available.

Commercial Providers

A list of commercial synthetic relational data providers is available in docs/SYNTHETIC_DATA_TOOLS.md.

Conflicts of Interest

The authors declare no conflict of interest and are not associated with any of the evaluated commercial synthetic data providers.

Citation

If you use SyntheRela in your work, please cite our paper:

@inproceedings{
    iclrsyntheticdata2025syntherela,
    title={SyntheRela: A Benchmark For Synthetic Relational Database Generation},
    author={Martin Jurkovic and Valter Hudovernik and Erik {\v{S}}trumbelj},
    booktitle={Will Synthetic Data Finally Solve the Data Access Problem?},
    year={2025},
    url={https://openreview.net/forum?id=ZfQofWYn6n}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntherela-0.0.4.tar.gz (304.7 kB view details)

Uploaded Source

Built Distribution

syntherela-0.0.4-py3-none-any.whl (413.8 kB view details)

Uploaded Python 3

File details

Details for the file syntherela-0.0.4.tar.gz.

File metadata

  • Download URL: syntherela-0.0.4.tar.gz
  • Upload date:
  • Size: 304.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for syntherela-0.0.4.tar.gz
Algorithm Hash digest
SHA256 5577a8ae4ed813aafe0e18b2ec20f5072306cd089d57b65b35dae0a3b77639e7
MD5 eec25f60859161b3f4248181600d0f9c
BLAKE2b-256 83a77a8d75192d0cda69d179a025b7b741dacf0543553f1f12698215732cf7b8

See more details on using hashes here.

File details

Details for the file syntherela-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: syntherela-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 413.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for syntherela-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 38f9b37bd39238c5bde290c5bb4de72fba6461872bf3f27c2236d30436c8e441
MD5 1e8f7847d9e72f3fdc37135383546d59
BLAKE2b-256 e92b04b19412a0c6ff5052d1fe5d851e36e35aeef485a93817a7551db63ebac2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page