Skip to main content

1-Diffractor: a highly efficient word-level Metric Differential Privacy mechanism.

Project description

PyPI version License

1-Diffractor

1-Diffractor is a high-performance library for word-level text perturbation leveraging Metric Differential Privacy. It maps text into 1D sorted embedding spaces to apply noise, ensuring privacy guarantees while maintaining semantic utility.

Key Features

  • Metric DP Implementation: Support for both Truncated Geometric and Truncated Exponential (TEM) mechanisms.
  • Automated Embedding Management: Automatically downloads and caches filtered embedding models (GloVe, Word2Vec, Numberbatch).
  • Parallel Processing: Uses optimized multiprocessing to perturb large batches of text quickly.
  • BYOE (Bring Your Own Embeddings): CLI tools to clean and integrate custom embedding files into the 1-Diffractor.

Quickstart Guide

Installation

pip install dp-diffractor

Basic Usage

from diffractor import Diffractor, DiffractorConfig

# Configure the privacy mechanism
config = DiffractorConfig(
    method="geometric", 
    epsilon=1.0, 
    verbose=True
)

with Diffractor(config) as df:
    texts = ["Differential Privacy is really cool!", "Hello world."]
    perturbed = df.rewrite(texts)
    print(perturbed)

Advanced Configuration

The DiffractorConfig object allows you to customize the privatization parameters:

Parameter Default Description
method geometric The DP mechanism: geometric or TEM.
epsilon 1.0 Privacy budget (ε). Lower is more private.
gamma 5 Neighborhood radius for the TEM scoring function.
sensitivity 1.0 Sensitivity of the scoring function.
replace_stopwords False If False, keeps common stopwords unchanged.
verbose True Enables progress bars and status logging.
seed 42 Global seed for reproducible perturbations.

Managing Embeddings

1-Diffractor keeps a local cache (by default, ~/.cache/diffractor) to store embedding files.

Custom Embeddings (BYOE)

If you have your own embedding file, you must filter it against the internal vocabulary to ensure it works with the privatization mechanism:

# In your terminal
diffractor-clean path/to/my_vectors.txt

Then, use it during startup:

df = Diffractor(model_names=["my_vectors_filtered"])

Default Models

By default, 1-Diffractor fetches and uses the following embedding models:

  • conceptnet-numberbatch-19-08-300
  • glove-twitter-200
  • glove-wiki-gigaword-300
  • glove-commoncrawl-30
  • word2vec-google-news-300

Citation

If you find 1-Diffractor useful or make use of it in your research, please be sure to cite the original paper:

@inproceedings{10.1145/3643651.3659896,
author = {Meisenbacher, Stephen and Chevli, Maulik and Matthes, Florian},
title = {1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy},
year = {2024},
isbn = {9798400705564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3643651.3659896},
doi = {10.1145/3643651.3659896},
booktitle = {Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics},
pages = {23–33},
numpages = {11},
keywords = {data privacy, differential privacy, natural language processing},
location = {Porto, Portugal},
series = {IWSPA '24}
}

Please also consider citing the hosted embedding files:

@dataset{meisenbacher_2026_19701515,
  author       = {Meisenbacher, Stephen},
  title        = {Filtered Embedding Files for 1-Diffractor},
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19701515},
  url          = {https://doi.org/10.5281/zenodo.19701515},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp_diffractor-0.1.0.tar.gz (147.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dp_diffractor-0.1.0-py3-none-any.whl (145.4 kB view details)

Uploaded Python 3

File details

Details for the file dp_diffractor-0.1.0.tar.gz.

File metadata

  • Download URL: dp_diffractor-0.1.0.tar.gz
  • Upload date:
  • Size: 147.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a106dfa1475397ef2ce5161e67ac75f59e36185a87be2fa009f4eb2b39464541
MD5 0e2498e925ae5df7422c2b82c55e7cc2
BLAKE2b-256 e6ce05d334217f0fdaf7df876f513c83e728cf53f1f54b744b46416cb2fba59f

See more details on using hashes here.

File details

Details for the file dp_diffractor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dp_diffractor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 145.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 182b47a51f77db5ddc247104d72efa9fc3dc1a6d390f715ee3f35948c033e297
MD5 14619cd150005787a7a878f52a669ef0
BLAKE2b-256 4b10bb90d6ea93f913480c868ee79bd9ee0942dea1d1659ac099a9065cb3677d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page