Skip to main content

1-Diffractor: a highly efficient word-level Metric Differential Privacy mechanism.

Project description

PyPI version License

1-Diffractor

1-Diffractor is a high-performance library for word-level text perturbation leveraging Metric Differential Privacy. It maps text into 1D sorted embedding spaces to apply noise, ensuring privacy guarantees while maintaining semantic utility.

Key Features

  • Metric DP Implementation: Support for both Truncated Geometric and Truncated Exponential (TEM) mechanisms.
  • Automated Embedding Management: Automatically downloads and caches filtered embedding models (GloVe, Word2Vec, Numberbatch).
  • Parallel Processing: Uses optimized multiprocessing to perturb large batches of text quickly.
  • BYOE (Bring Your Own Embeddings): CLI tools to clean and integrate custom embedding files into the 1-Diffractor.

Quickstart Guide

Installation

pip install dp-diffractor

Basic Usage

from diffractor import Diffractor, DiffractorConfig

# Configure the privacy mechanism
config = DiffractorConfig(
    method="geometric", 
    epsilon=1.0, 
    verbose=True
)

with Diffractor(config) as df:
    texts = ["Differential Privacy is really cool!", "Hello world."]
    perturbed = df.rewrite(texts)
    print(perturbed)

Advanced Configuration

The DiffractorConfig object allows you to customize the privatization parameters:

Parameter Default Description
method geometric The DP mechanism: geometric or TEM.
epsilon 1.0 Privacy budget (ε). Lower is more private.
gamma 5 Neighborhood radius for the TEM scoring function.
sensitivity 1.0 Sensitivity of the scoring function.
replace_stopwords False If False, keeps common stopwords unchanged.
verbose True Enables progress bars and status logging.
seed 42 Global seed for reproducible perturbations.

Managing Embeddings

1-Diffractor keeps a local cache (by default, ~/.cache/diffractor) to store embedding files.

Custom Embeddings (BYOE)

If you have your own embedding file, you must filter it against the internal vocabulary to ensure it works with the privatization mechanism:

# In your terminal
diffractor-clean path/to/my_vectors.txt

Then, use it during startup:

df = Diffractor(model_names=["my_vectors_filtered"])

Default Models

By default, 1-Diffractor fetches and uses the following embedding models:

  • conceptnet-numberbatch-19-08-300
  • glove-twitter-200
  • glove-wiki-gigaword-300
  • glove-commoncrawl-30
  • word2vec-google-news-300

Citation

If you find 1-Diffractor useful or make use of it in your research, please be sure to cite the original paper:

@inproceedings{10.1145/3643651.3659896,
author = {Meisenbacher, Stephen and Chevli, Maulik and Matthes, Florian},
title = {1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy},
year = {2024},
isbn = {9798400705564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3643651.3659896},
doi = {10.1145/3643651.3659896},
booktitle = {Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics},
pages = {23–33},
numpages = {11},
keywords = {data privacy, differential privacy, natural language processing},
location = {Porto, Portugal},
series = {IWSPA '24}
}

Please also consider citing the hosted embedding files:

@dataset{meisenbacher_2026_19701515,
  author       = {Meisenbacher, Stephen},
  title        = {Filtered Embedding Files for 1-Diffractor},
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19701515},
  url          = {https://doi.org/10.5281/zenodo.19701515},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp_diffractor-0.1.1.tar.gz (148.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dp_diffractor-0.1.1-py3-none-any.whl (145.7 kB view details)

Uploaded Python 3

File details

Details for the file dp_diffractor-0.1.1.tar.gz.

File metadata

  • Download URL: dp_diffractor-0.1.1.tar.gz
  • Upload date:
  • Size: 148.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5ff10b50d8e34ef1250ad8986dcbd53daf91e952a0f7b09e549c7908c2f4f4ea
MD5 974c6567880d560ae4c708971aff41ff
BLAKE2b-256 9f516deb59a53cf1618f3a160bf8503b48e394ebe1356a4e568305f6df02f217

See more details on using hashes here.

File details

Details for the file dp_diffractor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dp_diffractor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 145.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9d051090bb044c94043f0b83d1a1174f1e6c7c76bfa7edda03baabbe109a49b7
MD5 98f55d147bef40f7b020001148599370
BLAKE2b-256 103eac73fd4c5debc0bc53a11e90ae35ede28c8b54e472a7f8f7c4f5fa2c73d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page