Word-level Metric Local Differential Privacy Mechanisms

These details have not been verified by PyPI

Project description

MLDP

This repository contains the official implementation for the paper: A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off (LREC-COLING 2024). It provides production-ready, highly optimized implementations for six word-level Metric Local Differential Privacy (MLDP) mechanisms.

Included Mechanisms

The package implements the following MLDP text privatization strategies:

MultivariateCalibrated: paper
TruncatedGumbel: paper
VickreyMechanism: paper
TEM: paper
Mahalanobis: paper
SynTF: paper

Note that the code for SanText is not included as it is already publicly available here.

Installation

Getting started is as simple as installing the package:

pip install mldp-text

Basic Usage

The package exposes a unified factory function called get_mechanism() to seamlessly switch between different MLDP algorithms using string IDs.

Embedding Perturbation Mechanisms

For any of these mechanism, initialization is straightforward. By default, mechanisms look for an optimized faiss index to accelerate nearest-neighbor lookups:

import mldp_text

# Initialize your chosen strategy
mechanism = mldp_text.get_mechanism("multivariate_calibrated", epsilon=1, use_faiss=True)

# Privatize individual words
perturbed_word = mechanism.replace_word("pizza")
print(perturbed_word)

SynTF Mechanism

The SynTF mechanism is frequency-driven and requires a document corpus to pre-calculate and cache its reference TF-IDF matrix:

import mldp_text

corpus = ["your list of reference dataset documents here", "another document sample"]

# Initialize SynTF with document data
mechanism = mldp_text.get_mechanism("syntf", epsilon=1.0, data=corpus)

perturbed_word = mechanism.replace_word("pizza")

Supported Mechanisms

When using get_mechanism(name), you can pass any of the following string variants for the name parameter (case-insensitive, hyphens/underscores are normalized automatically):

MLDP Mechanism	Allowed String IDs (name=)
MultivariateCalibrated	`multivariate_calibrated`
TruncatedGumbel	`truncated_gumbel`
VickreyMechanism	`vickrey`
TEM	`tem`
Mahalanobis	`mahalanobis`
SynTF	`syntf`

Embedding Models

By default, the package looks for the glove.840B.300d embedding model pre-filtered to a fixed companion vocabulary (data/vocab.txt). Both assets are derived from the official Stanford GloVe project.

Loading Custom Embeddings

You can pass your own custom word embedding model into any mechanism. The package automatically inspects your file header beforehand to confirm it aligns with the native gensim format standard: [VOCAB SIZE] [EMBEDDING DIMENSION] (e.g., 400000 300).

You can feed custom paths into the package in two ways:

Option 1: Session-Wide Override

Change the underlying fallback path before instantiating any mechanisms:

import mldp_text

mldp.utils.EMBED = "/path/to/your/custom_gensim_embeddings.txt"

engine = mldp_text.get_mechanism("mahalanobis", epsilon=1.2)

Option 2: Mechanism Parameter

Pass the file path directly to the instantiation call:

import mldp_text

engine = mldp_text.get_mechanism(
    "vickrey", 
    epsilon=1, 
    embed="/path/to/custom_vectors.txt"
)

Get Privatizing!

With these methods, you can now explore word-level Metric Local Differential Privacy text privatization. In case of any questions or suggestions, feel free to reach out to the authors.

Citation

If you find this work useful, please consider citing the original LREC-COLING work, which implemented and evaluated these MLDP mechanisms:

@inproceedings{meisenbacher-etal-2024-comparative,
    title = "A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking the Privacy-Utility Trade-off",
    author = "Meisenbacher, Stephen  and
      Nandakumar, Nihildev  and
      Klymenko, Alexandra  and
      Matthes, Florian",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.16/",
    pages = "174--185"
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 17, 2026

0.1.1

May 17, 2026

0.1.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldp_text-0.1.2.tar.gz (38.7 MB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mldp_text-0.1.2-py3-none-any.whl (33.5 MB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file mldp_text-0.1.2.tar.gz.

File metadata

Download URL: mldp_text-0.1.2.tar.gz
Upload date: May 17, 2026
Size: 38.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for mldp_text-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`85af800b94f15b9398d4f1f37e925cbbb9e38b75e550f8914b892e7d47c68fc6`
MD5	`833cd47533bd878c2822f92be2c69b57`
BLAKE2b-256	`de1166a1e1f293479e4f0b9b44944065c9ccb34df9252d0d1ea65f1525f67422`

See more details on using hashes here.

File details

Details for the file mldp_text-0.1.2-py3-none-any.whl.

File metadata

Download URL: mldp_text-0.1.2-py3-none-any.whl
Upload date: May 17, 2026
Size: 33.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for mldp_text-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70bbe36cc4f6435726d44131eb9635f2fffd3a227506a6ab41bc9c07d729a17e`
MD5	`47a43dec56a1f537f955130ac65d19aa`
BLAKE2b-256	`514d92456048b894926e173b2a86829dfd4f779decbc5e43d0e903a22a2603c7`

See more details on using hashes here.

mldp-text 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MLDP

Included Mechanisms

Installation

Basic Usage

Embedding Perturbation Mechanisms

SynTF Mechanism

Supported Mechanisms

Embedding Models

Loading Custom Embeddings

Option 1: Session-Wide Override

Option 2: Mechanism Parameter

Get Privatizing!

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes