Skip to main content

LLM-based Differential Privacy mechanisms for sentence-based text rewriting with infilling models.

Project description

PrivFill

PyPI version GitHub stars License

privfill is a Python package providing LLM-based local Differential Privacy (DP) mechanisms for text privatization via sentece infilling. It offers easy-to-use wrappers for fine-tuned Hugging Face models. This software was originally presented in the NAACL 2025 findings paper: On the Impact of Noise in Differentially Private Text Rewriting

Installation

Install the package locally in editable mode from your project's root directory:

pip install privfill

Core Prerequisites:

  • Python $\geq$ 3.9
  • PyTorch (CUDA recommended for faster inference)
  • Transformers & NLTK

Basic Usage & Model Selection

Instead of typing Hugging Face repository paths, you can choose from the three built-in models using the SupportedModels enum.

import privfill

# Choose between FLAN_T5_BASE, FLAN_T5_LARGE, and BART_LARGE
engine = privfill.load_pipeline(privfill.SupportedModels.FLAN_T5_BASE, DP=True)

text = "This is a long private document ... which contains sensitive information and should be privatized,"
private_text = engine.privatize(text, epsilon=10)

print(private_text)

As described in the paper, we also create an analagous, non-DP variant of PrivFill. The usage is very similar:

engine = privfill.load_pipeline(privfill.SupportedModels.FLAN_T5_BASE, DP=False)
private_text = engine.privatize(text)

Available Models

Enum Hugging Face Repository Base Mechanism
SupportedModels.FLAN_T5_BASE sjmeis/flan-t5-base-infill-combined DP-Prompt
SupportedModels.FLAN_T5_LARGE sjmeis/flan-t5-large-infill-combined DP-Prompt
SupportedModels.BART_LARGE sjmeis/bart-large-infill-combined DP-BART

Models

We make our three sentence infilling models public. They can be found at this link.

Comparison Code

We also include the LLMDP class code for DP-BART and DP-Prompt, as used in the paper.

X = LLMDP.DPPrompt()
# or
X = LLMDP.DPBart()

# then
X.privatize(text, epsilon)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privfill-0.1.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privfill-0.1.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file privfill-0.1.0.tar.gz.

File metadata

  • Download URL: privfill-0.1.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for privfill-0.1.0.tar.gz
Algorithm Hash digest
SHA256 51a25519056eebeef431c19c2b40e10a0e235580cc817ea96ec485de9e33c9c5
MD5 388622a3329f2c8ffb0b42514f5a00a6
BLAKE2b-256 c70ea848b51649d056c80d40b2cf6684b5eed2973040683761e1c7a8a0d064f6

See more details on using hashes here.

File details

Details for the file privfill-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: privfill-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for privfill-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ec3f1252f63e82196da97b537deabc076a2e3657f036188b4126494f88986f8
MD5 82123c42a22a4d94c6cd17e4c89b4e6a
BLAKE2b-256 c98e16c94b87cda6476f600ed41b2363bbd659f539a0dfa5a5dd143a131d205b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page