Exploring the Effects of Disjoint Generation of Synthetic Tabular Data.
Project description
Disjoint Generative Models
Disjoint Generative Models (DGMs) is a framework for generating synthetic data by distributing the generation of different attributes to different generative models. DGMs unlock mixed model generation, allowing the user to choose ``correct tool for the correct job'' and infers increased privacy by not having a single model that has access to all the data.
The library provides a simple API for generating synthetic data using a variety of generative models and joining strategies. The library has access to a variety of generative model backends namely SynthCity, DataSynthesizer and Synthpop, but additional backends can be added in the adapters module. Similarly several methods for joining are available for combining the generated data, and more can be added in the joining strategies module.
Installation
To install the library, run the following command:
pip install disjoint-generation
One of the generative model backends "synthpop" requires a working R installation on the system. Access is handled through subprocess to run an Rscript command, so make sure that the Rscript command works in the terminal.
Tutorial and Codebooks
Below is codebooks that can be used to replicate the results shown in the paper.
| Link | Description | Fig. refs. |
|---|---|---|
| Tutorial | A simple tutorial on how to use the library | NA |
| Codebook 1 | Introductionary experiments, random joining, incresing number of partitions | Fig.2 |
| Codebook 2 | High-dimensional dataset example vith validation, correlated partitions study | Fig. 3, 4, 5 |
| Codebook 3 | Mixed-model generation and combinatorics | Fig. 6, Tab. 2, 3 |
| Codebook 4 | Study of the joining validator model, optimisation and calibration | Fig. 7, 8, 9 |
Additional examples for how to use the library can be seen in the documentation in the source code folder.
Requirements
The library requires Python 3.10 (we use version 3.10.12) and the following packages:
- numpy ~= 1.26
- pandas ~= 2.2.3
- scipy ~= 1.12
- scikit-learn ~= 1.5
- synthcity >= 0.2.11
- DataSynthesizer ~= 0.1.13
- pyod >= 2.0
Additonally, the synthpop generative model is accessed through R (we used version 4.1.2), and requires the following R packages:
- synthpop ~= 1.8.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file disjoint_generation-1.0.tar.gz.
File metadata
- Download URL: disjoint_generation-1.0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d99e5c2a831d59314a9c2021b675430def43b39bea07638919a869745afba3e
|
|
| MD5 |
86bc16b72c69057c041f084a529068e1
|
|
| BLAKE2b-256 |
d027b063295e98c55a143be969d3719f0507c756580eef9f7884e8af9289c28f
|
Provenance
The following attestation bundles were made for disjoint_generation-1.0.tar.gz:
Publisher:
release.yml on notna07/disjoint-synthetic-data-generation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
disjoint_generation-1.0.tar.gz -
Subject digest:
4d99e5c2a831d59314a9c2021b675430def43b39bea07638919a869745afba3e - Sigstore transparency entry: 312069418
- Sigstore integration time:
-
Permalink:
notna07/disjoint-synthetic-data-generation@4b02da99c1dc9a141d86b5dddb4c4fa4d2877c85 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/notna07
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4b02da99c1dc9a141d86b5dddb4c4fa4d2877c85 -
Trigger Event:
release
-
Statement type:
File details
Details for the file disjoint_generation-1.0-py3-none-any.whl.
File metadata
- Download URL: disjoint_generation-1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d8c4c58a9d2174351da35243029d5dd5bd1877597e6f449f836d0e59222f76
|
|
| MD5 |
199b07a54e0569c81e5f065149998422
|
|
| BLAKE2b-256 |
60e8b39ac700fbf772968884ef35228c4a66a978753109c729275958b8bd1d39
|
Provenance
The following attestation bundles were made for disjoint_generation-1.0-py3-none-any.whl:
Publisher:
release.yml on notna07/disjoint-synthetic-data-generation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
disjoint_generation-1.0-py3-none-any.whl -
Subject digest:
55d8c4c58a9d2174351da35243029d5dd5bd1877597e6f449f836d0e59222f76 - Sigstore transparency entry: 312069428
- Sigstore integration time:
-
Permalink:
notna07/disjoint-synthetic-data-generation@4b02da99c1dc9a141d86b5dddb4c4fa4d2877c85 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/notna07
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4b02da99c1dc9a141d86b5dddb4c4fa4d2877c85 -
Trigger Event:
release
-
Statement type: