Skip to main content

Clean personally identifiable information from dirty dirty text.

Project description

Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don’t. This package makes it easy to seamlessly scrub personal information from free text, without compromising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

  • Names

  • Email addresses

  • Addresses/Postal codes (US, GB, CA)

  • Credit card numbers

  • Dates of birth

  • URLs

  • Phone numbers

  • Username and password combinations

  • Skype/twitter usernames

  • Social security numbers (US and GB national insurance numbers)

  • Tax numbers (GB)

  • Driving licence numbers (GB)

Build Status Version Downloads Test Coverage Documentation Status

Quick start

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub

# My cat may be more tech-savvy than most, but he doesn't want other people to know it.
>>> text = "My cat can be contacted on example@example.com, or 1800 555-5555"

# Replaces the phone number and email addresse with anonymous IDs.
>>> scrubadub.clean(text)
'My cat can be contacted on {{EMAIL}}, or {{PHONE}}'

There are many ways to tailor the behavior of scrubadub using different Detectors and PostProcessors. Scrubadub is highly configurable and supports localisation for different languages and regions.

Installation

To install scrubadub using pip, simply type:

pip install scrubadub

There are several other packages that can optionally be installed to enable extra detectors. These scrubadub_address, scrubadub_spacy and scrubadub_stanford, see the relevant documentation (address detector documentation and name detector documentation) for more info on these as they require additional dependencies. This package requires at least python 3.6. For python 2.7 or 3.5 support use v1.2.2 which is the last version with support for these versions.

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs_scrubadub-2.1.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

cs_scrubadub-2.1.1-py3-none-any.whl (64.1 kB view details)

Uploaded Python 3

File details

Details for the file cs_scrubadub-2.1.1.tar.gz.

File metadata

  • Download URL: cs_scrubadub-2.1.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for cs_scrubadub-2.1.1.tar.gz
Algorithm Hash digest
SHA256 fbbc879a7bbc5f3f3d95707eaddfd01baac11b96ce11f37f06959c49dec8e2f5
MD5 2c458b4f938d55f880b0cf4a3719d366
BLAKE2b-256 c191fee02c3e8c221c88c7f335c76ecb8f557b9e7b6ded709e775190884cfdb0

See more details on using hashes here.

File details

Details for the file cs_scrubadub-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: cs_scrubadub-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 64.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for cs_scrubadub-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02e00d6bf2e77ca48271bf04cec790f7a5aff37dd007f0d46799e816599bfaf0
MD5 0c9ec2adaa3012a2a70616a2749f3878
BLAKE2b-256 fb64d3d08010c19947ea465c42a251b2b023c2dff8d4a12d937be0f6fc2b8a77

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page