Text anonymization using Faker
Project description
Anonymization
Text anonymization in many languages for python3.6+ using Faker.
Install
pip install anonymization
Example
Replace emails and named entities in english
This example use NamedEntitiesAnonymizer which require spacy and a spacy model.
pip install spacy
python -m spacy download en_core_web_lg
>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer
>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> anon.anonymize(text)
'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com \n Ariel the best program!'
Or make it reversible with pseudonymize:
>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer
>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> clean_text, patch = anon.pseudonymize(text)
>>> print(clean_text)
'Christopher, \nthanks for you for subscribing to Audrey, feel free to ask me any question at colemanwesley@hotmail.com \n Audrey the best program!'
revert_text = anon.revert(clean_text, patch)
>>> print(text == revert_text)
true
Replace a french phone number with a fake one
Our solution supports many languages along with their specific information formats.
For example, we can generate a french phone number:
>>> from anonymization import Anonymization, PhoneNumberAnonymizer
>>>
>>> text = "C'est bien le 0611223344 ton numéro ?"
>>> anon = Anonymization('fr_FR')
>>> phoneAnonymizer = PhoneNumberAnonymizer(anon)
>>> phoneAnonymizer.anonymize(text)
"C'est bien le 0144939332 ton numéro ?"
More examples in /examples
Included anonymizers
Files
name | lang |
---|---|
FilePathAnonymizer | - |
Internet
name | lang |
---|---|
EmailAnonymizer | - |
UriAnonymizer | - |
MacAddressAnonymizer | - |
Ipv4Anonymizer | - |
Ipv6Anonymizer | - |
Phone numbers
name | lang |
---|---|
PhoneNumberAnonymizer | 47+ |
msisdnAnonymizer | 47+ |
Date
name | lang |
---|---|
DateAnonymizer | - |
Other
name | lang |
---|---|
NamedEntitiesAnonymizer | 7+ |
DictionaryAnonymizer | - |
SignatureAnonymizer | 7+ |
Custom anonymizers
Custom anonymizers can be easily created to fit your needs:
class CustomAnonymizer():
def __init__(self, anonymization: Anonymization):
self.anonymization = anonymization
def anonymize(self, text: str) -> str:
return modified_text
# or replace by regex patterns in text using a faker provider
return self.anonymization.regex_anonymizer(text, pattern, provider)
# or replace all occurences using a faker provider
return self.anonymization.replace_all(text, matchs, provider)
You may also add new faker provider with the helper Anonymization.add_provider(FakerProvider)
or access the faker instance directly Anonymization.faker
.
Benchmark
This module is benchmarked on synth_dataset from presidio-research and returns accuracy result(0.79) better than Microsoft's solution(0.75)
You can run the benchmark using docker:
docker build . -f ./benchmark/dockerfile -t anonbench
docker run -it --rm --name anonbench anonbench
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anonymization-0.1.9.tar.gz
.
File metadata
- Download URL: anonymization-0.1.9.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 947e37ff2cd89bb094e9b4152cc18132826befcafda68627b9806af61d5b9372 |
|
MD5 | 03535f7a96598cb5ee8e4008208f8814 |
|
BLAKE2b-256 | deb2b94c4f0a612e08412f8417f1b9d463fe18ebde37208fd0534dc2b076a68a |
File details
Details for the file anonymization-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: anonymization-0.1.9-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d288f2c2e5bfbac69d313c0d95ca467d34100dc59606be5e4e91f7d3af2723d5 |
|
MD5 | 9ce3e5985477a29dfa9df111c283d453 |
|
BLAKE2b-256 | 07dc1a144f5d384d5d18d5adebf3695358b4ca3e09e6f97cca19255413df405a |