Sanitize text containing PII attributes while mainting high utility with cryptographic guarantees

Project description

preempt

Prϵϵmpt is a security framework designed to protect personally identifiable information (PII) in text by applying encryption or other privacy-preserving techniques before that data is sent to third-party large language model (LLM) APIs.

Prϵϵmpt achieves high utility for a diverse range of tasks while maintaining cryptographic guarantees. For the experiments and results found in Prϵϵmpt: Sanitizing Sensitive Prompts for LLMs, please refer to this repo.

This is a modular version of Prϵϵmpt, meant to be used as part of other projects.

Setup

Install uv following the instructions here.
Create a virtual environment with Python 3.11, activate it and add preempt:

uv venv --python 3.11
. ./.venv/bin/activate
uv add preempt

Import Prϵϵmpt methods and classes with from preempt.utils import * to use in your code. See the usage examples below and in demo.ipynb.

If you would like to work with repo, clone the repo, navigate to the base folder (preempt) and use the following:

uv venv --python 3.11
. ./.venv/bin/activate
uv sync

If you already have a project in which you would like to use Prϵϵmpt, use either pip install preempt or uv add preempt, depending on the set up.

Usage

Additional usage examples can be found in demo.ipynb.

We will add support for generalized NER and sanitization in the near future.

Complete Usage Example

This is a complete usage example where we sanitize names and currency values. Make sure you either have Universal NER or Llama-3 8B Instruct available.

Import all utilities:

# Import utils
from preempt.utils import *

Initialize a NER and Sanitizer object:

# Load NER object
# ner_model = NER("/path/to/UniNER-7B-all", device="cuda:1")
ner_model = NER("/path/to/Meta-Llama-3-8B-Instruct/", device="cuda:1")

# Load Sanitizer object
sanitizer_name = Sanitizer(ner_model, key = "EF4359D8D580AA4F7F036D6F04FC6A94", tweak = "D8E7920AFA330A73")
sanitizer_money = Sanitizer(ner_model, key = "FF4359D8D580AA4F7F036D6F04FC6A94", tweak = "E8E7920AFA330A73")

# Sentences
sentences = ["Ben Parker and John Doe went to the bank and withdrew $200.", "Adam won $20 in the lottery."]

Sanitize names in sentences:

# Sanitizing names
sanitized_sentences, _ = sanitizer_name.encrypt(sentences, entity='Name', epsilon=1)
print("Sanitized sentences:")
print(sanitized_sentences)
"""
Prints:

Sanitized sentences:
['Jay Francois and Lamine Franklin went to the bank and withdrew $200.', 'Elie Vinod won $20 in the lottery.']
"""

Sanitize currency values in sanitized_sentences:

# Sanitizing currency values
sanitized_sentences, _ = sanitizer_money.encrypt(sanitized_sentences, entity='Money', epsilon=1)
print("Sanitized sentences:")
print(sanitized_sentences)
"""
Prints:

Sanitized sentences:
['Jay Francois and Lamine Franklin went to the bank and withdrew $769451698.', 'Elie Vinod won $37083668 in the lottery.']
"""

Desanitize encrypted names in sanitized_sentences:

# Desanitizing names
desanitized_sentences = sanitizer_name.decrypt(sanitized_sentences, entity='Name', use_cache=True)
print("Desanitized sentences:")
print(desanitized_sentences)
"""
Prints:

Desanitized sentences:
['Ben Parker and John Doe went to the bank and withdrew $769451698.', 'Adam won $37083668 in the lottery.']
"""

Desanitize encrypted currency values in desanitized_sentences:

# Desanitizing currency values
desanitized_sentences = sanitizer_money.decrypt(desanitized_sentences, entity='Money', use_cache=True)
print("Desanitized sentences:")
print(desanitized_sentences)
"""
Prints:

Desanitized sentences:
['Ben Parker and John Doe went to the bank and withdrew $200.', 'Adam won $20 in the lottery.']
"""

Extraction

We currently support Universal NER and Llama-3 8B Instruct for NER. We will add support for including your own NER models in the near future.

Initialize a NER class object by passing the path to one of the supported NER models mentioned above:

ner_model = NER("/path/to/Meta-Llama-3-8B-Instruct/", device="cuda:0")

Extract PII values found in a list of target strings using ner_model.extract():

sentences = ["Ben Parker and John Doe went to the bank.", "Who was late today? Adam."]
extracted = ner_model.extract(sentences, entity_type='{Name/Money/Age}')

Sanitization

We currently only support sanitization for names, currency values and age, using either FPE or m-LDP.

Initialize a Sanitizer class object by passing the previously initialized ner_model, a key and tweak parameter (required for the FF3 cipher used for FPE).

sanitizer = Sanitizer(ner_model, key = "EF4359D8D580AA4F7F036D6F04FC6A94", tweak = "D8E7920AFA330A73")

Sanitize a list of target strings using sanitizer.encrypt():

sanitized_sentences, _ = sanitizer.encrypt(sentences, entity='Name', epsilon=1, use_fpe=True, use_mdp=False)

PII values found during NER are stored under sanitizer.new_entities as a nested list.

The mappings between plain text and cipher text PII values are stored under sanitizer.entity_mapping. FPE will typically extract PII values from the sanitized sentences before decryption.

Sanitized sentences can be desanitized using sanitizer.decrypt():

desanitized_sentences = sanitizer.decrypt(sanitized_sentences, entity='Name')

If your NER model can't reliably pick up sanitized attributes, consider setting use_cache=True, to decrypt using stored NER values.

desanitized_sentences = sanitizer.decrypt(sanitized_sentences, entity='Name', use_cache=True)

Sanitizing multiple PII attributes

If you want to sanitize multiple sensitive attributes, create a sanitizer for each category separately.

For more examples, check out demo.ipynb

Usage tips

NER typically works better when the inputs are smaller. Consider breaking a large chunk of text into smaller sentences when using the sanitizer.

Project details

Release history Release notifications | RSS feed

0.1.27

Apr 21, 2025

0.1.26

Apr 20, 2025

0.1.25

Apr 20, 2025

0.1.24

Apr 18, 2025

0.1.23

Apr 18, 2025

0.1.22

Apr 18, 2025

0.1.21

Apr 17, 2025

0.1.20

Apr 17, 2025

0.1.19

Apr 17, 2025

0.1.18

Apr 17, 2025

0.1.17

Apr 17, 2025

This version

0.1.16

Apr 17, 2025

0.1.15

Apr 17, 2025

0.1.14

Apr 17, 2025

0.1.13

Apr 14, 2025

0.1.12

Apr 14, 2025

0.1.11

Apr 14, 2025

0.1.10

Apr 14, 2025

0.1.9

Apr 14, 2025

0.1.8

Apr 14, 2025

0.1.7

Apr 14, 2025

0.1.6

Apr 14, 2025

0.1.5

Apr 14, 2025

0.1.4

Apr 14, 2025

0.1.3

Apr 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preempt-0.1.16.tar.gz (82.8 kB view details)

Uploaded Apr 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

preempt-0.1.16-py3-none-any.whl (48.7 kB view details)

Uploaded Apr 17, 2025 Python 3

File details

Details for the file preempt-0.1.16.tar.gz.

File metadata

Download URL: preempt-0.1.16.tar.gz
Upload date: Apr 17, 2025
Size: 82.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for preempt-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`b555417af0909611bb82e1a066c76cc1db71f979a0a34a2f11113f69613d6a7f`
MD5	`fdb285ba98739b855331b3833941054b`
BLAKE2b-256	`e0d12a438a81ea6c02ba0dd78d3eb3323faee89ae342038867bb3c14086c22ef`

See more details on using hashes here.

File details

Details for the file preempt-0.1.16-py3-none-any.whl.

File metadata

Download URL: preempt-0.1.16-py3-none-any.whl
Upload date: Apr 17, 2025
Size: 48.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for preempt-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70a4bcd96e65ed2777c2d6ab0a27cb609c3440612d9d182b3fe832278905bdfe`
MD5	`144b375645c17287bbcf535f90792714`
BLAKE2b-256	`9e51dcdcfa21f34c3bfacaf22d1ba3c6ac397f2ce927b19063442211e9291e53`

See more details on using hashes here.

preempt 0.1.16

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

preempt

Setup

Usage

Complete Usage Example

Extraction

Sanitization

Sanitizing multiple PII attributes

Usage tips

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes