Skip to main content

No project description provided

Project description

Differentially Private Synthetic Data Generation (DPSDG)

DPSDG overview

This repository contains the official implementation of the paper "Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation" as seen in TIME workshop of WebConf 2026. The code implements the same specs as the original CTGAN/TVAE model from the ctgan package.

Installation

From PyPI

pip install dpsdg

From Source

Clone the repository and install it locally:

git clone https://github.com/brains-group/dpsdg.git
cd dpsdg
pip install .

Or install directly from GitHub without cloning first:

pip install git+https://github.com/brains-group/dpsdg.git

Usage

This package extends CTGAN with differentially private training via DP-SGD. The API closely mirrors the original, so the only change is swapping the model class and adding a few privacy parameters.

Your data should be a pandas DataFrame with:

  • Continuous columns as float
  • Discrete/categorical columns as int or str
  • No missing values

DP-CTGAN

Original CTGAN (no privacy):

from ctgan import CTGAN

ctgan = CTGAN(epochs=300)
ctgan.fit(real_data, discrete_columns)
synthetic_data = ctgan.sample(1000)

DP-CTGAN (with differential privacy):

from dpsdg.models.dp_ctgan import DPDPCTGAN

model = DPDPCTGAN(epsilon=1.0, delta=1e-5, epochs=300)
model.fit_transformer(real_data, discrete_columns)  # must be called before fit
model.fit(real_data, discrete_columns)
synthetic_data = model.sample(1000)

The key difference is the fit_transformer call before fit. This sets up the privacy-aware data transformer. After that, fit and sample work exactly as in the original.

DP-TVAE

Original TVAE (no privacy):

from ctgan import TVAE

tvae = TVAE(epochs=300)
tvae.fit(real_data, discrete_columns)
synthetic_data = tvae.sample(1000)

DP-TVAE (with differential privacy):

from dpsdg import DPTVAE

model = DPTVAE(epsilon=1.0, delta=1e-5, epochs=300)
model.fit(real_data, discrete_columns)
synthetic_data = model.sample(1000)

Privacy Parameters

Both models add DP-specific parameters on top of the standard CTGAN/TVAE arguments.

Shared Parameters

Parameter Type Default Description
epsilon float 0.0 (CTGAN) / 1.0 (TVAE) Privacy budget (ε). Smaller values give a stronger privacy guarantee at the cost of data utility. Setting epsilon=0.0 disables DP noise entirely. Common choices are 1.0, 5.0, or 10.0.
delta float 1e-5 Failure probability (δ) for the DP guarantee. Should be much smaller than 1/n, where n is the number of training rows.
max_grad_norm float 1.0 Per-sample gradient clipping threshold used in DP-SGD. This bounds the sensitivity of each update — larger values preserve more gradient signal but require proportionally more noise to achieve the same ε.

DP-CTGAN Only

Parameter Type Default Description
use_gradient_penalty bool True Enables a WGAN-GP style gradient penalty on the discriminator. Recommended to keep enabled, as it stabilizes GAN training under the high noise levels typical of DP.
gp_lambda float 10 Weight for the gradient penalty term in the discriminator loss. Higher values enforce the Lipschitz constraint more strongly.

DP-TVAE Only

Parameter Type Default Description
use_opacus_noise_mul bool False When True, delegates noise multiplier calculation to Opacus. By default the package computes it directly via opacus.accountants.utils.get_noise_multiplier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpsdg-0.1.1b0.tar.gz (259.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpsdg-0.1.1b0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file dpsdg-0.1.1b0.tar.gz.

File metadata

  • Download URL: dpsdg-0.1.1b0.tar.gz
  • Upload date:
  • Size: 259.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dpsdg-0.1.1b0.tar.gz
Algorithm Hash digest
SHA256 8528d26ac005afd1afc9376d8026c52d03c6aed7bad2b4f5eedd28f42880e36f
MD5 5445a649a238a26eaa44fc8c59f97c97
BLAKE2b-256 b08b30c6370843bcffe4ba86460543b46907bacc7e1f8227269372961c64f997

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpsdg-0.1.1b0.tar.gz:

Publisher: publish.yml on brains-group/dpsdg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dpsdg-0.1.1b0-py3-none-any.whl.

File metadata

  • Download URL: dpsdg-0.1.1b0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dpsdg-0.1.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 60f5cbe72179ae34da2dc0026b20b01c1310cc29381c3d1f78346f0b0528a90b
MD5 dd9ffc8f91e3fbfabbda61816de7b1d6
BLAKE2b-256 405ae0312280456ab972c4648dfb47f6cabacedf1328e7b965cedacbac6924e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpsdg-0.1.1b0-py3-none-any.whl:

Publisher: publish.yml on brains-group/dpsdg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page