Skip to main content

A module to anonymize french text data

Project description

Incognito

Description

Incognito is a Python module for anonymizing French text. It uses Regex and other strategies to mask names and personal information provided by the user.
This module was specifically designed for medical reports, ensuring that disease names remain unaltered.

python


Installation

From pip

pip install incognito-anonymizer

From this repository

  1. Clone the repository:

    git clone https://github.com/Micropot/incognito
    
  2. Install the dependencies (defined in pyproject.toml):

    pip install .
    

Usage

Python API

Example: Providing Personal Information Directly in Code

from . import anonymizer

# Initialize the anonymizer
ano = anonymizer.Anonymizer()

# Define personal information
infos = {
    "first_name": "Bob",
    "last_name": "Jungels",
    "birth_name": "",
    "birthdate": "1992-09-22",
    "ipp": "0987654321",
    "postal_code": "01000",
    "adress": ""
}

# Configure the anonymizer
ano.set_info(infos)
ano.add_analyser('pii')
ano.add_analyser('regex')
ano.add_analyser('lossy') # trigger a warning. See doc string for better understanding
ano.set_mask('placeholder')

# Read and anonymize text
text_to_anonymize = ano.open_text_file("/path/to/file.txt")
anonymized_text = ano.anonymize(text_to_anonymize)

print(anonymized_text)

Example: Using JSON File for Personal Information

from . import anonymizer

# Initialize the anonymizer
ano = anonymizer.Anonymizer()

# Load personal information from JSON
infos_json = ano.open_json_file("/path/to/infofile.json")

# Configure the anonymizer
ano.set_info(infos_json)
ano.add_analyser('pii')
ano.add_analyser('regex')
ano.set_mask('placeholder')

# Read and anonymize text
text_to_anonymize = ano.open_text_file("/path/to/file.txt")
anonymized_text = ano.anonymize(text_to_anonymize)

print(anonymized_text)

Example: Annote a file

from . import anonymizer

# Initialize the anonymizer
ano = anonymizer.Anonymizer()

# Load personal information from JSON
infos_json = ano.open_json_file("/path/to/infofile.json")

# Configure the annotator
ano.set_info(infos_json)
ano.add_analyser('pii')
ano.add_analyser('regex')
ano.set_annotator('placeholder')

# Read and annotate text
text_to_anonymize = ano.open_text_file("/path/to/file.txt")
annotated_text = ano.annotate(text_to_anonymize)

print(annotated_text)

Command-Line Interface (CLI)

Basic Usage

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks

Find Available Strategies, Masks and Annotator

python -m incognito --help

Anonymization with JSON File

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks json --json myjsonfile.json

To view helper options for the JSON submodule:

python -m incognito json --help

Anonymization with Personal Information in CLI

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks infos --first_name Bob --last_name Dylan --birthdate 1800-01-01 --ipp 0987654312 --postal_code 75001

To view helper options for the "infos" submodule:

python -m incognito infos --help

Annotation

python -m incognito --input myinputfile.txt --output annotationfile.ann --strategies mystrategies --annotate myannotator infos --first_name Bob --last_name Dylan --birthdate 1800-01-01 --ipp 0987654312 --postal_code 75001

Unit Tests

Unit tests are included to ensure the module's functionality. You can modify them based on your needs.

To run the tests:

make test

To check code coverage:

make cov

Anonymization Process Details

Regex Strategy

One available anonymization strategy is Regex. It can extract and mask specific information from the input text, such as:

  • Email addresses
  • Phone numbers
  • French NIR (social security number)
  • First and last names (if preceded by titles like "Monsieur", "Madame", "Mr", "Mme", "Docteur", "Professeur", etc.)

For more details, see the RegexStrategy class and the self.title_regex variable.

PII Stategy

This strategy is used to catch the personal informations of the patient.

You can use it in CLI with the infos or in a json fil.

For further example you can see the CLI chapter

Lossy Strategy

Another available anonymization strategy is Lossy. The idea is to mask pattern like DUPONT Marc or Marc DUPONT.

!!!warn

It can produce false positive. Be aware that this strategy can can unexpected matched and  loose informations in your text

Get the matched entities

If you want to print the matched entities to check what the code did you can use the get_entities() function

    ano = Anonymizer()
    ano.add_analyzer("regex")
    ano.add_analyzer("lossy")
    ano.set_mask("placeholder")
    ano.anonymize(input)
    entities = ano.get_entities()

The output will match this kind of list : [ {"original": "DUPONT", "replacement": "<NOM>", "type": "NOM", "start": 42, "end": 49}, {"original": "01/01/1970", "replacement": "<DATE>", "type": "DATE", "start": 80, "end": 90}, ]

For more details, see the LossyStrategy class

Anotation Process Details

Standoff Strategy

You can create an annotation file based on the Standoff format.

This file will be automatically created based on the matched entity.

You can find example in the CLI/API chapters


License

This project is licensed under the terms of the MIT License.


Contributors

  • Maintainer: Micropot
    Feel free to open issues or contribute via pull requests!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incognito_anonymizer-1.4.6.tar.gz (123.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

incognito_anonymizer-1.4.6-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file incognito_anonymizer-1.4.6.tar.gz.

File metadata

  • Download URL: incognito_anonymizer-1.4.6.tar.gz
  • Upload date:
  • Size: 123.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for incognito_anonymizer-1.4.6.tar.gz
Algorithm Hash digest
SHA256 f36cb2bea48419100959b5442f3586b512a4fcf55b8995c3ebcc56410db98fcf
MD5 da21a9eaf986bee366712ac413b6296d
BLAKE2b-256 bad590d169de3f22995941e7993d935fd99beb8b93221563871108160118ca5e

See more details on using hashes here.

File details

Details for the file incognito_anonymizer-1.4.6-py3-none-any.whl.

File metadata

  • Download URL: incognito_anonymizer-1.4.6-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for incognito_anonymizer-1.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c7ccc6e1aa179956be99d2e73b88650a4c367418ecc8028dcc07edbb60b5b155
MD5 aee43aaae9df280f95b712a3b4b2e75e
BLAKE2b-256 b4589f307e422cd4274f38d95ab5f50d1bbd4c7e9cac5bf5f3568c1606ee6f22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page