Skip to main content

The data anonymization package

Project description

logo

Data anonymization package, supporting different anonymization strategies

Test Package version Supported Python versions


Documentation: https://eriknovak.github.io/anonipy

Source code: https://github.com/eriknovak/anonipy


The anonipy package is a python package for data anonymization. It is designed to be simple to use and highly customizable, supporting different anonymization strategies. Powered by LLMs.

Requirements

Before starting the project make sure these requirements are available:

  • python. The python programming language (v3.8, v3.9, v3.10).

Install

pip install anonipy

Upgrade

pip install anonipy --upgrade

Example

original_text = """\
Medical Record

Patient Name: John Doe
Date of Birth: 15-01-1985
Date of Examination: 20-05-2024
Social Security Number: 123-45-6789

Examination Procedure:
John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.

Medication Prescribed:

Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024
"""

Use the language detector to detect the language of the text:

from anonipy.utils.language_detector import LanguageDetector

language_detector = LanguageDetector()
language = language_detector(original_text)

Prepare the entity extractor and extract the personal infomation from the original text:

from anonipy.anonymize.extractors import NERExtractor

# define the labels to be extracted and anonymized
labels = [
    {"label": "name", "type": "string"},
    {"label": "social security number", "type": "custom"},
    {"label": "date of birth", "type": "date"},
    {"label": "date", "type": "date"},
]

# initialize the NER extractor for the language and labels
extractor = NERExtractor(labels, lang=language, score_th=0.5)

# extract the entities from the original text
doc, entities = extractor(original_text)

# display the entities in the original text
extractor.display(doc)

Use generators to create substitutes for the entities:

from anonipy.anonymize.generators import (
    LLMLabelGenerator,
    DateGenerator,
    NumberGenerator,
)

# initialize the generators
llm_generator = LLMLabelGenerator()
date_generator = DateGenerator()
number_generator = NumberGenerator()

# prepare the anonymization mapping
def anonymization_mapping(text, entity):
    if entity.type == "string":
        return llm_generator.generate(entity, temperature=0.7)
    if entity.label == "date":
        return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_MONTH")
    if entity.label == "date of birth":
        return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_YEAR")
    if entity.label == "social security number":
        return number_generator.generate(entity)
    return "[REDACTED]"

Anonymize the text using the anonymization mapping:

from anonipy.anonymize.strategies import PseudonymizationStrategy

# initialize the pseudonymization strategy
pseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)

# anonymize the original text
anonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)

Acknowledgements

Anonipy is developed by the Department for Artificial Intelligence at the Jozef Stefan Institute, and other contributors.

The project has received funding from the European Union's Horizon Europe research and innovation programme under Grant Agreement No 101080288 (PREPARE).

European

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonipy-0.2.0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

anonipy-0.2.0-py3-none-any.whl (34.2 kB view details)

Uploaded Python 3

File details

Details for the file anonipy-0.2.0.tar.gz.

File metadata

  • Download URL: anonipy-0.2.0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for anonipy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e595209020278726a360fbf25093cd7fc0d8d1137c7c552fb6252fe19e773f0f
MD5 4c2613afaaaf87ccda47311af883e55e
BLAKE2b-256 fe22c08cccb83786f38fb0f08f75004bf8c0ece673a395029307e655a0aedac9

See more details on using hashes here.

File details

Details for the file anonipy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: anonipy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 34.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for anonipy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02aa84fe527d95352484c092dac599e2f0568768cae11ef264a08964d4d953fd
MD5 b94d054300ccb81a06f7fe817ae6a59a
BLAKE2b-256 290e2c8666f881206a5b961a96c10c8f6e4617a8081954e002f45a09e8ba4969

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page