Skip to main content

A Python package for PII data anonymization

Project description

hexanonyme

Hexanonyme is a Python library designed to anonymize and de-anonymize personally identifiable information (PII) in French language text data. It provides a set of anonymization classes capable of replacing, redacting, or transforming PII entities within the text while preserving the text's structure.

Features

  • Anonymize PII entities such as names, addresses, dates, and more.

  • Redact PII entities from the text, replacing them with placeholders.

  • Restore redacted PII entities in a de-anonymized text for auditing or analysis purposes.

Installation

You can install Hexanonyme using pip:


pip install hexanonyme

Usage

Here's a quick example of how to use My Package to anonymize and de-anonymize French text:


from hexanonyme import ReplaceAnonymizer, RedactAnonymizer



# Initialize ReplaceAnonymizer

replace_anonymizer = ReplaceAnonymizer(entities=["PER", "DATE", "ADDRESS"])



# Anonymize PII entities

text = "Je réside au 11 impasse de la défense 75018 Paris. Je m'appelle Amel Douc. J'habite à Bordeaux. Je suis né le 29/12/2021."

anonymized_text = replace_anonymizer.replace(text)



# De-anonymize the text

original_text = replace_anonymizer.deanonymize(anonymized_text)



# Initialize RedactAnonymizer

redact_anonymizer = RedactAnonymizer(entities=["PER", "ADDRESS"])



# Redact PII entities

text = "Je réside au 11 impasse de la défense 75018 Paris. Je m'appelle Amel Douc. J'habite à Bordeaux. Je suis né le 29/12/2021."

redacted_text = redact_anonymizer.redact(text)



# De-anonymize the redacted text

restored_text = redact_anonymizer.deanonymize(redacted_text)

Why Data Anonymization Matters

Data anonymization is crucial for protecting individuals' privacy and complying with data protection regulations. When training AI-based language models, it's vital to ensure that personally identifiable information (PII) is not exposed. This library allows you to prepare your data before providing it to large language models like ChatGPT by removing or replacing PII.

How it works

Hexanonyme uses Camembert fine-tuned Named Entity Recognition (NER) models specifically tailored to French. The list of available entities currently includes:

PER (person names)

LOC (locations, cities, birthplaces)

DATE (birthdates)

ADDRESS (postal addresses)

These NER models accurately identify PII entities in French text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexanonyme-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

hexanonyme-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file hexanonyme-0.1.0.tar.gz.

File metadata

  • Download URL: hexanonyme-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for hexanonyme-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0c0d7544f001ccd48b52d6b51451ffa0f764aa2e8ab77e8dc62b3177a8b478c8
MD5 030a2e8bee72031d1e9ea679ddab22fb
BLAKE2b-256 8ba25bf44beb6965bed8a2457c9732ba203608cd45b38cd5e4120758f35314a7

See more details on using hashes here.

File details

Details for the file hexanonyme-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hexanonyme-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for hexanonyme-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b46e81d386be0f223482f8cc659a206bd8940f5502395ab0168af7a0e09d7536
MD5 4d71dc776f2d0e4587e86141d3f5f47d
BLAKE2b-256 683dc6e60957ecfa0b736ccdef98cdebdd6bcfcb4b7c004eaef3a53ebd72691f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page