1-Diffractor: a highly efficient word-level Metric Differential Privacy mechanism.
Project description
1-Diffractor
1-Diffractor is a high-performance library for word-level text perturbation leveraging Metric Differential Privacy. It maps text into 1D sorted embedding spaces to apply noise, ensuring privacy guarantees while maintaining semantic utility.
Key Features
- Metric DP Implementation: Support for both Truncated Geometric and Truncated Exponential (TEM) mechanisms.
- Automated Embedding Management: Automatically downloads and caches filtered embedding models (GloVe, Word2Vec, Numberbatch).
- Parallel Processing: Uses optimized multiprocessing to perturb large batches of text quickly.
- BYOE (Bring Your Own Embeddings): CLI tools to clean and integrate custom embedding files into the
1-Diffractor.
Quickstart Guide
Installation
pip install dp-diffractor
Basic Usage
from diffractor import Diffractor, DiffractorConfig
# Configure the privacy mechanism
config = DiffractorConfig(
method="geometric",
epsilon=1.0,
verbose=True
)
with Diffractor(config) as df:
texts = ["Differential Privacy is really cool!", "Hello world."]
perturbed = df.rewrite(texts)
print(perturbed)
Advanced Configuration
The DiffractorConfig object allows you to customize the privatization parameters:
| Parameter | Default | Description |
|---|---|---|
method |
geometric |
The DP mechanism: geometric or TEM. |
epsilon |
1.0 |
Privacy budget (ε). Lower is more private. |
gamma |
5 |
Neighborhood radius for the TEM scoring function. |
sensitivity |
1.0 |
Sensitivity of the scoring function. |
replace_stopwords |
False |
If False, keeps common stopwords unchanged. |
verbose |
True |
Enables progress bars and status logging. |
seed |
42 |
Global seed for reproducible perturbations. |
Managing Embeddings
1-Diffractor keeps a local cache (by default, ~/.cache/diffractor) to store embedding files.
Custom Embeddings (BYOE)
If you have your own embedding file, you must filter it against the internal vocabulary to ensure it works with the privatization mechanism:
# In your terminal
diffractor-clean path/to/my_vectors.txt
Then, use it during startup:
df = Diffractor(model_names=["my_vectors_filtered"])
Default Models
By default, 1-Diffractor fetches and uses the following embedding models:
conceptnet-numberbatch-19-08-300glove-twitter-200glove-wiki-gigaword-300glove-commoncrawl-30word2vec-google-news-300
Citation
If you find 1-Diffractor useful or make use of it in your research, please be sure to cite the original paper:
@inproceedings{10.1145/3643651.3659896,
author = {Meisenbacher, Stephen and Chevli, Maulik and Matthes, Florian},
title = {1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy},
year = {2024},
isbn = {9798400705564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3643651.3659896},
doi = {10.1145/3643651.3659896},
booktitle = {Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics},
pages = {23–33},
numpages = {11},
keywords = {data privacy, differential privacy, natural language processing},
location = {Porto, Portugal},
series = {IWSPA '24}
}
Please also consider citing the hosted embedding files:
@dataset{meisenbacher_2026_19701515,
author = {Meisenbacher, Stephen},
title = {Filtered Embedding Files for 1-Diffractor},
month = apr,
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.19701515},
url = {https://doi.org/10.5281/zenodo.19701515},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dp_diffractor-0.1.1.tar.gz.
File metadata
- Download URL: dp_diffractor-0.1.1.tar.gz
- Upload date:
- Size: 148.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ff10b50d8e34ef1250ad8986dcbd53daf91e952a0f7b09e549c7908c2f4f4ea
|
|
| MD5 |
974c6567880d560ae4c708971aff41ff
|
|
| BLAKE2b-256 |
9f516deb59a53cf1618f3a160bf8503b48e394ebe1356a4e568305f6df02f217
|
File details
Details for the file dp_diffractor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dp_diffractor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 145.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d051090bb044c94043f0b83d1a1174f1e6c7c76bfa7edda03baabbe109a49b7
|
|
| MD5 |
98f55d147bef40f7b020001148599370
|
|
| BLAKE2b-256 |
103eac73fd4c5debc0bc53a11e90ae35ede28c8b54e472a7f8f7c4f5fa2c73d3
|