Local differential privacy mechanisms
Project description
TrasgoDP implements different mechanims for ε-differential privacy and (ε, δ)-differential privacy. The mechanisms are implemented for being used under a local approach, adding noise directly to the raw data. Two types of mechanims are implemented:
- For numerical records: Laplace and Gaussian mechanisms. The implementation includes a final clipping applyied on the data with DP.
- For categorical records: Exponential mechanism and Randomized Response (both for binary attributes and the k-ary version).
This library provides dedicated function designed for being applied on both pandas dataframes and lists/numpy arrays.
Installation
You can install trasgoDP using pip. We recommend to use Python3 with virtualenv:
virtualenv .venv -p python3
source .venv/bin/activate
pip install trasgoDP
Mechanisms implemented
| Mechanism | Type of the attribute | Function in trasgoDP |
|---|---|---|
| Laplace | Numerical | numerical.dp_clip_laplace() |
| Gaussian | Numerical | numerical.dp_clip_gaussian() |
| Exponential | Categorical | categorical.dp_exponential() |
| Randomized response | Categorical (binary) | categorical.dp_randomized_response_binary() |
| k-ary randomized response | Categorical | categorical.dp_randomized_response_kary() |
Getting started
For applying DP mechanisms to a column of a dataframe you need to introduce:
- The pandas dataframe with the data.
- The column in the dataframe to be privatized.
- The privacy budget (ε).
- The probability of exceeding the privacy budget (δ) in case of numerical attributes and the Gaussian mechanism.
- The uper and lower bounds for numerical attributes (optional).
Example: apply DP to the adult dataset with the Laplace mechanism for the column age and the Exponential mechanism for the column workclass:
import pandas as pd
from trasgodp.numerical import dp_clip_laplace
from trasgodp.categorical import dp_exponential
# Read and process the data
data = pd.read_csv("examples/adult.csv")
data.columns = data.columns.str.strip()
cols = [
"workclass",
"education",
"marital-status",
"occupation",
"sex",
"native-country",
]
for col in cols:
data[col] = data[col].str.strip()
# Apply DP for the attribute age:
column_num = "age"
epsilon1 = 10
df = dp_clip_laplace(data, column_num, epsilon1, new_column=True)
# Apply DP for the attribute workclass:
column_cat = "workclass"
epsilon2 = 5
df = dp_exponential(data, column_cat, epsilon2, new_column=True)
Warning
This project is under active development.
License
This project is licensed under the Apache 2.0 license.
Related work
If you are using trasgoDP, you may also be interested in:
- pyCANON: a Python library for checking the level of anonymity of a dataset.
- anjana: a Python library for anonymizing tabular datasets.
Funding and acknowledgments
This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trasgodp-0.3.1.tar.gz.
File metadata
- Download URL: trasgodp-0.3.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9ff2a58b7454d80db43982ebcc145cf7258319d858bfaae2cae70514bcda900
|
|
| MD5 |
371a72e8ef67b90f54a85bac261d2e4c
|
|
| BLAKE2b-256 |
a8421c3fcc37cd5247b159aa0278372fe155f58f6d5141899e5a7c13ad24a745
|
File details
Details for the file trasgodp-0.3.1-py3-none-any.whl.
File metadata
- Download URL: trasgodp-0.3.1-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
694483d923f2bcfb26793b7fd20b12745215802ee0897dd03196cd8bf92ae966
|
|
| MD5 |
36dbcfdb3b022a08767a787b0c20efda
|
|
| BLAKE2b-256 |
10243fe69b7453450584f675f2350080bd660f62c10157d8d6603b6d89cfeed4
|