Skip to main content

Text anonymization using Faker

Project description

Anonymization

Text anonymization in many languages for python3.6+ using Faker.

Install

pip install anonymization

Example

Replace a french phone number with a fake one

>>> from anonymization import Anonymization, PhoneNumberAnonymizer
>>>
>>> text = "C'est bien le 0611223344 ton numéro ?"
>>> anon = Anonymization('fr_FR')
>>> phoneAnonymizer = PhoneNumberAnonymizer(anon)
>>> phoneAnonymizer.anonymize(text)
"C'est bien le 0144939332 ton numéro ?"

Replace emails and named entities in english

This example use NamedEntitiesAnonymizer which require spacy and a spacy model.

pip install spacy
python -m spacy download en
>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer

>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en'))
>>> anon.anonymize(text)
'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com \n Ariel the best program!'

Included anonymizers

Files

name lang
FilePathAnonymizer -

Internet

name lang
EmailAnonymizer -
UriAnonymizer -
MacAddressAnonymizer -
Ipv4Anonymizer -
Ipv6Anonymizer -

Phone numbers

name lang
PhoneNumberAnonymizer 47+
msisdnAnonymizer 47+

Spacy

name lang
NamedEntitiesAnonymizer 7+

Custom anonymizers

Custom anonymizers can be easily created to fit your needs:

class CustomAnonymizer():
    def __init__(self, anonymization: Anonymization):
        self.anonymization = anonymization

    def anonymize(self, text: str) -> str:
        return modified_text
        # or replace by regex patterns in text using a faker provider
        return self.anonymization.regex_anonymizer(text, pattern, provider)
        # or replace all occurences using a faker provider
        return self.anonymization.replace_all(text, matchs, provider)

You may also add new faker provider with the helper Anonymization.add_provider(FakerProvider) or access the faker instance directly Anonymization.faker.

Contribution

Contributions are welcome both to improve the code base and add new anonymizers. Feel free to open PR & issues.

For new anonymizers, make sure:

  • that they works with as many languages as possible
  • to use type hinting
  • to document them in the table

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymization-0.1.1.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

anonymization-0.1.1-py3-none-any.whl (7.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page