Skip to main content

Use the Stanford NER model to clean personally identifiable information from dirty dirty text.

Project description

scrubadub removes personally identifiable information from text. scrubadub_address is an extension that uses pyap and libpostal to remove addresses from text.

This package contains one extra detector:

  • scrubadub_address.detectors.AddressDetector - A detector that finds British, American and Canadian addresses.

For more information on how to use this package see the scrubadub address documentation and the scrubadub repository.

Build Status Version Downloads Test Coverage Documentation Status

Installation

First libpostal needs to be installed. Full instructions can be found in the libpostal documentation, but a summary is given below for linux installation:

$ sudo apt-get install curl autoconf automake libtool pkg-config
$ git clone https://github.com/openvenues/libpostal
$ cd libpostal
$ ./bootstrap.sh
$ ./configure --prefix=/usr/local/
$ make -j4
$ sudo make install

Once you have installed libpostal, the remaining python dependencies can be installed:

$ pip install pypostal scrubadub_address

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrubadub_address-2.0.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrubadub_address-2.0.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file scrubadub_address-2.0.1.tar.gz.

File metadata

  • Download URL: scrubadub_address-2.0.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for scrubadub_address-2.0.1.tar.gz
Algorithm Hash digest
SHA256 4a3e321b0cdf69b646a28ea38752614468598abc15332c4a6dcfb25fb53beb21
MD5 c7a280e484dfb21c54d6f7f9cd564d78
BLAKE2b-256 251b179b649efebd2d66396d174c5bcfbddb4e89449692ec3b151412bf6e6a0a

See more details on using hashes here.

File details

Details for the file scrubadub_address-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: scrubadub_address-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for scrubadub_address-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4b2c8b763902253a10815f6e66b56e28c268998fb8edf137757293d351f9e690
MD5 458dba4ba46a07711f1e4195583a706f
BLAKE2b-256 d973fcb424fbe3b53245f61c05df60290444ec482e1e9bb6ed8b6ae8067a7c25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page