Skip to main content

Exploring the Effects of Disjoint Generation of Synthetic Tabular Data.

Project description

Disjoint Generative Models

Disjoint Generative Models (DGMs) is a framework for generating synthetic data by distributing the generation of different attributes to different generative models. DGMs unlock mixed model generation, allowing the user to choose ``correct tool for the correct job'' and infers increased privacy by not having a single model that has access to all the data.

The library provides a simple API for generating synthetic data using a variety of generative models and joining strategies. The library has access to a variety of generative model backends namely SynthCity, DataSynthesizer and Synthpop, but additional backends can be added in the adapters module. Similarly several methods for joining are available for combining the generated data, and more can be added in the joining strategies module.

Installation

To install the library, run the following command:

pip install disjoint-generation

One of the generative model backends "synthpop" requires a working R installation on the system. Access is handled through subprocess to run an Rscript command, so make sure that the Rscript command works in the terminal.

Tutorial and Codebooks

Below is codebooks that can be used to replicate the results shown in the paper.

Link Description Fig. refs.
Tutorial A simple tutorial on how to use the library NA
Codebook 1 Introductionary experiments, random joining, incresing number of partitions Fig.2
Codebook 2 High-dimensional dataset example vith validation, correlated partitions study Fig. 3, 4, 5
Codebook 3 Mixed-model generation and combinatorics Fig. 6, Tab. 2, 3
Codebook 4 Study of the joining validator model, optimisation and calibration Fig. 7, 8, 9

Additional examples for how to use the library can be seen in the documentation in the source code folder.

Requirements

The library requires Python 3.10 (we use version 3.10.12) and the following packages:

  • numpy ~= 1.26
  • pandas ~= 2.2.3
  • scipy ~= 1.12
  • scikit-learn ~= 1.5
  • synthcity >= 0.2.11
  • DataSynthesizer ~= 0.1.13
  • pyod >= 2.0

Additonally, the synthpop generative model is accessed through R (we used version 4.1.2), and requires the following R packages:

  • synthpop ~= 1.8.0

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disjoint_generation-1.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

disjoint_generation-1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file disjoint_generation-1.0.tar.gz.

File metadata

  • Download URL: disjoint_generation-1.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for disjoint_generation-1.0.tar.gz
Algorithm Hash digest
SHA256 4d99e5c2a831d59314a9c2021b675430def43b39bea07638919a869745afba3e
MD5 86bc16b72c69057c041f084a529068e1
BLAKE2b-256 d027b063295e98c55a143be969d3719f0507c756580eef9f7884e8af9289c28f

See more details on using hashes here.

Provenance

The following attestation bundles were made for disjoint_generation-1.0.tar.gz:

Publisher: release.yml on notna07/disjoint-synthetic-data-generation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file disjoint_generation-1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for disjoint_generation-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55d8c4c58a9d2174351da35243029d5dd5bd1877597e6f449f836d0e59222f76
MD5 199b07a54e0569c81e5f065149998422
BLAKE2b-256 60e8b39ac700fbf772968884ef35228c4a66a978753109c729275958b8bd1d39

See more details on using hashes here.

Provenance

The following attestation bundles were made for disjoint_generation-1.0-py3-none-any.whl:

Publisher: release.yml on notna07/disjoint-synthetic-data-generation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page