Skip to main content

Text anonymization using Faker

Project description

Anonymization

Text anonymization in many languages for python3.6+ using Faker.

Install

pip install anonymization

Example

Replace emails and named entities in english

This example use NamedEntitiesAnonymizer which require spacy and a spacy model.

pip install spacy
python -m spacy download en_core_web_lg
>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer

>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> anon.anonymize(text)
'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at shanestevenson@gmail.com \n Ariel the best program!'

Or make it reversible with pseudonymize:

>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer

>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at secret.mail@Superprogram.com \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> clean_text, patch = anon.pseudonymize(text)

>>> print(clean_text)
'Christopher, \nthanks for you for subscribing to Audrey, feel free to ask me any question at colemanwesley@hotmail.com \n Audrey the best program!'

revert_text = anon.revert(clean_text, patch)

>>> print(text == revert_text)
true

Replace a french phone number with a fake one

Our solution supports many languages along with their specific information formats.

For example, we can generate a french phone number:

>>> from anonymization import Anonymization, PhoneNumberAnonymizer
>>>
>>> text = "C'est bien le 0611223344 ton numéro ?"
>>> anon = Anonymization('fr_FR')
>>> phoneAnonymizer = PhoneNumberAnonymizer(anon)
>>> phoneAnonymizer.anonymize(text)
"C'est bien le 0144939332 ton numéro ?"

More examples in /examples

Included anonymizers

Files

name lang
FilePathAnonymizer -

Internet

name lang
EmailAnonymizer -
UriAnonymizer -
MacAddressAnonymizer -
Ipv4Anonymizer -
Ipv6Anonymizer -

Phone numbers

name lang
PhoneNumberAnonymizer 47+
msisdnAnonymizer 47+

Date

name lang
DateAnonymizer -

Other

name lang
NamedEntitiesAnonymizer 7+
DictionaryAnonymizer -
SignatureAnonymizer 7+

Custom anonymizers

Custom anonymizers can be easily created to fit your needs:

class CustomAnonymizer():
    def __init__(self, anonymization: Anonymization):
        self.anonymization = anonymization

    def anonymize(self, text: str) -> str:
        return modified_text
        # or replace by regex patterns in text using a faker provider
        return self.anonymization.regex_anonymizer(text, pattern, provider)
        # or replace all occurences using a faker provider
        return self.anonymization.replace_all(text, matchs, provider)

You may also add new faker provider with the helper Anonymization.add_provider(FakerProvider) or access the faker instance directly Anonymization.faker.

Benchmark

This module is benchmarked on synth_dataset from presidio-research and returns accuracy result(0.79) better than Microsoft's solution(0.75)

You can run the benchmark using docker:

docker build . -f ./benchmark/dockerfile -t anonbench
docker run -it --rm --name anonbench anonbench

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anonymization-0.1.9.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

anonymization-0.1.9-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file anonymization-0.1.9.tar.gz.

File metadata

  • Download URL: anonymization-0.1.9.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for anonymization-0.1.9.tar.gz
Algorithm Hash digest
SHA256 947e37ff2cd89bb094e9b4152cc18132826befcafda68627b9806af61d5b9372
MD5 03535f7a96598cb5ee8e4008208f8814
BLAKE2b-256 deb2b94c4f0a612e08412f8417f1b9d463fe18ebde37208fd0534dc2b076a68a

See more details on using hashes here.

File details

Details for the file anonymization-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: anonymization-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for anonymization-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d288f2c2e5bfbac69d313c0d95ca467d34100dc59606be5e4e91f7d3af2723d5
MD5 9ce3e5985477a29dfa9df111c283d453
BLAKE2b-256 07dc1a144f5d384d5d18d5adebf3695358b4ca3e09e6f97cca19255413df405a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page