Text anonymization using Faker
Project description
Anonymization
Text anonymization in many languages for python3.6+ using Faker.
Install
pip install anonymization
Example
Replace a french phone number with a fake one
>>> from anonymization import Anonymization, PhoneNumberAnonymizer
>>>
>>> text = "C'est bien le 0611223344 ton numéro ?"
>>> anon = Anonymization('fr_FR')
>>> phoneAnonymizer = PhoneNumberAnonymizer(anon)
>>> phoneAnonymizer.anonymize(text)
"C'est bien le 0144939332 ton numéro ?"
Replace emails and named entities in english
This example use NamedEntitiesAnonymizer which require spacy and a spacy model.
pip install spacy
python -m spacy download en
>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer
>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en'))
>>> anon.anonymize(text)
'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com \n Ariel the best program!'
Included anonymizers
Files
name | lang |
---|---|
FilePathAnonymizer | - |
Internet
name | lang |
---|---|
EmailAnonymizer | - |
UriAnonymizer | - |
MacAddressAnonymizer | - |
Ipv4Anonymizer | - |
Ipv6Anonymizer | - |
Phone numbers
name | lang |
---|---|
PhoneNumberAnonymizer | 47+ |
msisdnAnonymizer | 47+ |
Spacy
name | lang |
---|---|
NamedEntitiesAnonymizer | 7+ |
Custom anonymizers
Custom anonymizers can be easily created to fit your needs:
class CustomAnonymizer():
def __init__(self, anonymization: Anonymization):
self.anonymization = anonymization
def anonymize(self, text: str) -> str:
return modified_text
# or replace by regex patterns in text using a faker provider
return self.anonymization.regex_anonymizer(text, pattern, provider)
# or replace all occurences using a faker provider
return self.anonymization.replace_all(text, matchs, provider)
You may also add new faker provider with the helper Anonymization.add_provider(FakerProvider)
or access the faker instance directly Anonymization.faker
.
Contribution
Contributions are welcome both to improve the code base and add new anonymizers. Feel free to open PR & issues.
For new anonymizers, make sure:
- that they works with as many languages as possible
- to use type hinting
- to document them in the table
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
anonymization-0.1.1.tar.gz
(5.6 kB
view hashes)
Built Distribution
Close
Hashes for anonymization-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17c9de25c93818cc0a99f2f6ae9a893df2173943eb5970ae13cc55636326b583 |
|
MD5 | acb0662c6cc77b42ad0f33b310ddb3d0 |
|
BLAKE2b-256 | 3f1924ad23171d906cdd0565ab2f014bd2b2c17c02d7093adb12687e19be6e8b |