Anonymizes pandas dataset and provides a hash dictionary to de-anonymize
Project description
NER Anonymizer
This repository contains some developmental tools to anonymize a pandas dataframe.
NER Anonymizer contains a class DataAnonymizer
which handles anonymization in free text columns by using named entity recognition (NER) with a pretrained model from the transformers package to pick up entities such as location and person, generate a MD5 hash for the entity, replaces the entity with the hash, and stores the hash to entity in a dictionary for de-anonymization. A similar process is repeated for categorical columns, without the use of NER.
Example Usage
Open a terminal and run the following lines (this assumes you have python 3 installed):
git clone https://github.com/kelvnt/data_anonymizer.git
cd data_anonymizer
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
jupyter-lab
Open example_usage.ipynb
to explore how DataAnonymizer works.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ner_anonymizer-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d18c0d0e23f9c9cb8dd92661212dab113893cfc3a6e8231f97f01d536d04c361 |
|
MD5 | fa96d450cfae79e52a9b0d897e6440b9 |
|
BLAKE2b-256 | 89e6cc2352b8d5158b26fb8090c67915d9710e55f0820356cc3a9f1ddbae23b9 |