Skip to main content

A module to anonymize french text data

Project description

Incognito

Description

Incognito is a Python module for anonymizing French text. It uses Regex and other strategies to mask names and personal information provided by the user.
This module was specifically designed for medical reports, ensuring that disease names remain unaltered.

python


Installation

From pip

pip install incognito-anonymizer

From this repository

  1. Clone the repository:

    git clone https://github.com/Micropot/incognito
    
  2. Install the dependencies (defined in pyproject.toml):

    pip install .
    

Usage

Python API

Example: Providing Personal Information Directly in Code

from . import anonymizer

# Initialize the anonymizer
ano = anonymizer.Anonymizer()

# Define personal information
infos = {
    "first_name": "Bob",
    "last_name": "Jungels",
    "birth_name": "",
    "birthdate": "1992-09-22",
    "ipp": "0987654321",
    "postal_code": "01000",
    "adress": ""
}

# Configure the anonymizer
ano.set_info(infos)
ano.set_strategies(['regex', 'pii'])
ano.set_masks('placeholder')

# Read and anonymize text
text_to_anonymize = ano.open_text_file("/path/to/file.txt")
anonymized_text = ano.anonymize(text_to_anonymize)

print(anonymized_text)

Example: Using JSON File for Personal Information

from . import anonymizer

# Initialize the anonymizer
ano = anonymizer.Anonymizer()

# Load personal information from JSON
infos_json = ano.open_json_file("/path/to/infofile.json")

# Configure the anonymizer
ano.set_info(infos_json)
ano.set_strategies(['regex', 'pii'])
ano.set_masks('placeholder')

# Read and anonymize text
text_to_anonymize = ano.open_text_file("/path/to/file.txt")
anonymized_text = ano.anonymize(text_to_anonymize)

print(anonymized_text)

Command-Line Interface (CLI)

Basic Usage

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks

Find Available Strategies and Masks

python -m incognito --help

Anonymization with JSON File

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks json --json myjsonfile.json

To view helper options for the JSON submodule:

python -m incognito json --help

Anonymization with Personal Information in CLI

python -m incognito --input myinputfile.txt --output myanonymizedfile.txt --strategies mystrategies --mask mymasks infos --first_name Bob --last_name Dylan --birthdate 1800-01-01 --ipp 0987654312 --postal_code 75001

To view helper options for the "infos" submodule:

python -m incognito infos --help

Unit Tests

Unit tests are included to ensure the module's functionality. You can modify them based on your needs.

To run the tests:

make test

To check code coverage:

make cov

Anonymization Process Details

Regex Strategy

One available anonymization strategy is Regex. It can extract and mask specific information from the input text, such as:

  • Email addresses
  • Phone numbers
  • French NIR (social security number)
  • First and last names (if preceded by titles like "Monsieur", "Madame", "Mr", "Mme", "Docteur", "Professeur", etc.)

For more details, see the RegexStrategy class and the self.title_regex variable.


Documentation

The documentation is available here.

License

This project is licensed under the terms of the MIT License.


Contributors

  • Maintainer: Micropot
    Feel free to open issues or contribute via pull requests!

Similar project

EDS NLP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

incognito_anonymizer-0.0.7.tar.gz (27.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

incognito_anonymizer-0.0.7-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

incognito_anonymizer-0.0.7-py2.py3-none-any.whl (9.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file incognito_anonymizer-0.0.7.tar.gz.

File metadata

  • Download URL: incognito_anonymizer-0.0.7.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for incognito_anonymizer-0.0.7.tar.gz
Algorithm Hash digest
SHA256 cecc0d8b728cf8b91c92d1274a46643ab774429794e7466450cef9899a5bb86a
MD5 53add6501d295190701b301619383345
BLAKE2b-256 071905a9094a1b9f575a60bb198216f91576cd31f4ccd37e7a42f85c51d410a6

See more details on using hashes here.

File details

Details for the file incognito_anonymizer-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for incognito_anonymizer-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 611503c8b2b315320fab3ec8b33f0a77fe8ac0fabd7e26f78595d59cb00778e4
MD5 03b738f75fea994e71baa0eb9a67d02f
BLAKE2b-256 30b07ed8b9dc767987bfab88399c155cc5d71eddc3046145db5a0494304afadf

See more details on using hashes here.

File details

Details for the file incognito_anonymizer-0.0.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for incognito_anonymizer-0.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8afc4be92d132bc8b80963700caa9d0ed3d0231fffe25369da51ec75bf3a63cb
MD5 a41ded8aa50504978e1e3a8ddde7e64f
BLAKE2b-256 67d667edac7ace8108d80ecd0df134c06d3622c1d537c1b25c106ac6feb0b021

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page