Skip to main content

1-Diffractor: a highly efficient word-level Metric Differential Privacy mechanism.

Project description

PyPI version License

1-Diffractor

1-Diffractor is a high-performance library for word-level text perturbation leveraging Metric Differential Privacy. It maps text into 1D sorted embedding spaces to apply noise, ensuring privacy guarantees while maintaining semantic utility.

Key Features

  • Metric DP Implementation: Support for both Truncated Geometric and Truncated Exponential (TEM) mechanisms.
  • Automated Embedding Management: Automatically downloads and caches filtered embedding models (GloVe, Word2Vec, Numberbatch).
  • Parallel Processing: Uses optimized multiprocessing to perturb large batches of text quickly.
  • BYOE (Bring Your Own Embeddings): CLI tools to clean and integrate custom embedding files into 1-Diffractor.

Quickstart Guide

Installation

pip install dp-diffractor

Basic Usage

from diffractor import Diffractor, DiffractorConfig

# Configure the privacy mechanism
config = DiffractorConfig(
    method="geometric", 
    epsilon=1.0, 
    verbose=True
)

with Diffractor(config) as df:
    texts = ["Differential Privacy is really cool!", "Hello world."]
    perturbed = df.rewrite(texts)
    print(perturbed)

Optionally, you can also pass an epsilon directly to rewrite without having to instantiate the mechanism again. Additionally, you may pass a list of epsilon values, where the length of this list much match the token count of the text exactly, as per nltk.word_tokenize.

Note: due to the nature of the word embedding models, the input text must be all lowercase! We perform an extra check for this as well and convert the inputs to lowercase.

Advanced Configuration

The DiffractorConfig object allows you to customize the privatization parameters:

Parameter Default Description
method geometric The DP mechanism: geometric or TEM.
epsilon 1.0 Privacy budget (ε). Lower is more private.
gamma 5 Neighborhood radius for the TEM scoring function.
sensitivity 1.0 Sensitivity of the scoring function.
replace_stopwords False If False, keeps common stopwords unchanged.
verbose True Enables progress bars and status logging.
seed 42 Global seed for reproducible perturbations.

Managing Embeddings

1-Diffractor keeps a local cache (by default, ~/.cache/diffractor) to store embedding files.

Custom Embeddings (BYOE)

If you have your own embedding file, you must filter it against the internal vocabulary to ensure it works with the privatization mechanism:

# In your terminal
diffractor-clean path/to/my_vectors.txt

Then, use it during startup:

df = Diffractor(model_names=["my_vectors_filtered"])

Default Models

By default, 1-Diffractor fetches and uses the following embedding models:

  • conceptnet-numberbatch-19-08-300
  • glove-twitter-200
  • glove-wiki-gigaword-300
  • glove-commoncrawl-30
  • word2vec-google-news-300

Citation

If you find 1-Diffractor useful or make use of it in your research, please be sure to cite the original paper:

@inproceedings{10.1145/3643651.3659896,
author = {Meisenbacher, Stephen and Chevli, Maulik and Matthes, Florian},
title = {1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy},
year = {2024},
isbn = {9798400705564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3643651.3659896},
doi = {10.1145/3643651.3659896},
booktitle = {Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics},
pages = {23–33},
numpages = {11},
keywords = {data privacy, differential privacy, natural language processing},
location = {Porto, Portugal},
series = {IWSPA '24}
}

Please also consider citing the hosted embedding files:

@dataset{meisenbacher_2026_19701515,
  author       = {Meisenbacher, Stephen},
  title        = {Filtered Embedding Files for 1-Diffractor},
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19701515},
  url          = {https://doi.org/10.5281/zenodo.19701515},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp_diffractor-0.2.1.tar.gz (148.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dp_diffractor-0.2.1-py3-none-any.whl (146.1 kB view details)

Uploaded Python 3

File details

Details for the file dp_diffractor-0.2.1.tar.gz.

File metadata

  • Download URL: dp_diffractor-0.2.1.tar.gz
  • Upload date:
  • Size: 148.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5a1c8f53162688c12039c69a7df6db265dc110307a4a4329fd0c2e0685bc91ac
MD5 af08fc98f6965b5406653890e83398a4
BLAKE2b-256 a8d7d36f9b968ff2a89361c43955bdd153d9851a50f3a27b6017ead43ab857cb

See more details on using hashes here.

File details

Details for the file dp_diffractor-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dp_diffractor-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 146.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for dp_diffractor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 91a54e820e55b148967ca1c843ad18d0ec688184f1394abb84fa2102aa732820
MD5 176c21f3369286316b6fb19523daabb4
BLAKE2b-256 1b7648a5c728b95a6788486fd69e5c1d003ba126cc7de21e85dfcc92ba8b7652

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page