A Python package for PII data anonymization
Project description
hexanonyme
Hexanonyme is a Python library designed to anonymize and de-anonymize personally identifiable information (PII) in French language text data. It provides a set of anonymization classes capable of replacing, redacting, or transforming PII entities within the text while preserving the text's structure.
Features
-
Anonymize PII entities such as names, addresses, dates, and more.
-
Redact PII entities from the text, replacing them with placeholders.
-
Restore redacted PII entities in a de-anonymized text for auditing or analysis purposes.
Installation
You can install Hexanonyme using pip
:
pip install hexanonyme
Usage
Here's a quick example of how to use My Package to anonymize and de-anonymize French text:
from hexanonyme import ReplaceAnonymizer, RedactAnonymizer
# Initialize ReplaceAnonymizer
replace_anonymizer = ReplaceAnonymizer(entities=["PER", "DATE", "ADDRESS"])
# Anonymize PII entities
text = "Je réside au 11 impasse de la défense 75018 Paris. Je m'appelle Amel Douc. J'habite à Bordeaux. Je suis né le 29/12/2021."
anonymized_text = replace_anonymizer.replace(text)
# De-anonymize the text
original_text = replace_anonymizer.deanonymize(anonymized_text)
# Initialize RedactAnonymizer
redact_anonymizer = RedactAnonymizer(entities=["PER", "ADDRESS"])
# Redact PII entities
text = "Je réside au 11 impasse de la défense 75018 Paris. Je m'appelle Amel Douc. J'habite à Bordeaux. Je suis né le 29/12/2021."
redacted_text = redact_anonymizer.redact(text)
# De-anonymize the redacted text
restored_text = redact_anonymizer.deanonymize(redacted_text)
Why Data Anonymization Matters
Data anonymization is crucial for protecting individuals' privacy and complying with data protection regulations. When training AI-based language models, it's vital to ensure that personally identifiable information (PII) is not exposed. This library allows you to prepare your data before providing it to large language models like ChatGPT by removing or replacing PII.
How it works
Hexanonyme uses Camembert fine-tuned Named Entity Recognition (NER) models specifically tailored to French. The list of available entities currently includes:
PER (person names)
LOC (locations, cities, birthplaces)
DATE (birthdates)
ADDRESS (postal addresses)
These NER models accurately identify PII entities in French text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hexanonyme-0.1.0.tar.gz
.
File metadata
- Download URL: hexanonyme-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c0d7544f001ccd48b52d6b51451ffa0f764aa2e8ab77e8dc62b3177a8b478c8 |
|
MD5 | 030a2e8bee72031d1e9ea679ddab22fb |
|
BLAKE2b-256 | 8ba25bf44beb6965bed8a2457c9732ba203608cd45b38cd5e4120758f35314a7 |
File details
Details for the file hexanonyme-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: hexanonyme-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b46e81d386be0f223482f8cc659a206bd8940f5502395ab0168af7a0e09d7536 |
|
MD5 | 4d71dc776f2d0e4587e86141d3f5f47d |
|
BLAKE2b-256 | 683dc6e60957ecfa0b736ccdef98cdebdd6bcfcb4b7c004eaef3a53ebd72691f |