Skip to main content

1-Diffractor: a highly efficient word-level Metric Differential Privacy mechanism.

Project description

PyPI version License

1-Diffractor

1-Diffractor is a high-performance library for word-level text perturbation leveraging Metric Differential Privacy. It maps text into 1D sorted embedding spaces to apply noise, ensuring privacy guarantees while maintaining semantic utility.

Key Features

  • Metric DP Implementation: Support for both Truncated Geometric and Truncated Exponential (TEM) mechanisms.
  • Automated Embedding Management: Automatically downloads and caches filtered embedding models (GloVe, Word2Vec, Numberbatch).
  • Parallel Processing: Uses optimized multiprocessing to perturb large batches of text quickly.
  • BYOE (Bring Your Own Embeddings): CLI tools to clean and integrate custom embedding files into 1-Diffractor.

Quickstart Guide

Installation

pip install dp-diffractor

Basic Usage

from diffractor import Diffractor, DiffractorConfig

# Configure the privacy mechanism
config = DiffractorConfig(
    method="geometric", 
    epsilon=1.0, 
    verbose=True
)

with Diffractor(config) as df:
    texts = ["Differential Privacy is really cool!", "Hello world."]
    perturbed = df.rewrite(texts)
    print(perturbed)

Advanced Configuration

The DiffractorConfig object allows you to customize the privatization parameters:

Parameter Default Description
method geometric The DP mechanism: geometric or TEM.
epsilon 1.0 Privacy budget (ε). Lower is more private.
gamma 5 Neighborhood radius for the TEM scoring function.
sensitivity 1.0 Sensitivity of the scoring function.
replace_stopwords False If False, keeps common stopwords unchanged.
verbose True Enables progress bars and status logging.
seed 42 Global seed for reproducible perturbations.

Managing Embeddings

1-Diffractor keeps a local cache (by default, ~/.cache/diffractor) to store embedding files.

Custom Embeddings (BYOE)

If you have your own embedding file, you must filter it against the internal vocabulary to ensure it works with the privatization mechanism:

# In your terminal
diffractor-clean path/to/my_vectors.txt

Then, use it during startup:

df = Diffractor(model_names=["my_vectors_filtered"])

Default Models

By default, 1-Diffractor fetches and uses the following embedding models:

  • conceptnet-numberbatch-19-08-300
  • glove-twitter-200
  • glove-wiki-gigaword-300
  • glove-commoncrawl-30
  • word2vec-google-news-300

Citation

If you find 1-Diffractor useful or make use of it in your research, please be sure to cite the original paper:

@inproceedings{10.1145/3643651.3659896,
author = {Meisenbacher, Stephen and Chevli, Maulik and Matthes, Florian},
title = {1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy},
year = {2024},
isbn = {9798400705564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3643651.3659896},
doi = {10.1145/3643651.3659896},
booktitle = {Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics},
pages = {23–33},
numpages = {11},
keywords = {data privacy, differential privacy, natural language processing},
location = {Porto, Portugal},
series = {IWSPA '24}
}

Please also consider citing the hosted embedding files:

@dataset{meisenbacher_2026_19701515,
  author       = {Meisenbacher, Stephen},
  title        = {Filtered Embedding Files for 1-Diffractor},
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19701515},
  url          = {https://doi.org/10.5281/zenodo.19701515},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp_diffractor-0.2.0.tar.gz (148.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dp_diffractor-0.2.0-py3-none-any.whl (145.9 kB view details)

Uploaded Python 3

File details

Details for the file dp_diffractor-0.2.0.tar.gz.

File metadata

  • Download URL: dp_diffractor-0.2.0.tar.gz
  • Upload date:
  • Size: 148.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fc11d154832b6319e3c2f8fc301d9455bc57bc4e750eb089e7628554e1c6dddd
MD5 7771ef5ec96c3f6f0ccaf840c1e7cba7
BLAKE2b-256 088041db44ed37b2e7a04f2db0778e8702b737b9533e4a9a432acd7b80dc9341

See more details on using hashes here.

File details

Details for the file dp_diffractor-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dp_diffractor-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 145.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4644ba014ffc1d0587745461b51f7858998468245e6eba7e9ba1d0223bf352f2
MD5 e86131b137fe83951359924e1a8c3777
BLAKE2b-256 d249d4488e1951f884b8eccb257177114c6b49ebe46a949884bdb40e7d41dd07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page