Use the Stanford NER model to clean personally identifiable information from dirty dirty text.
Project description
scrubadub removes personally identifiable information from text. scrubadub_address is an extension that uses pyap and libpostal to remove addresses from text.
This package contains one extra detector:
scrubadub_address.detectors.AddressDetector - A detector that finds British, American and Canadian addresses.
For more information on how to use this package see the scrubadub address documentation and the scrubadub repository.
Installation
First libpostal needs to be installed. Full instructions can be found in the libpostal documentation, but a summary is given below for linux installation:
$ sudo apt-get install curl autoconf automake libtool pkg-config
$ git clone https://github.com/openvenues/libpostal
$ cd libpostal
$ ./bootstrap.sh
$ ./configure --prefix=/usr/local/
$ make -j4
$ sudo make install
Once you have installed libpostal, the remaining python dependencies can be installed:
$ pip install pypostal scrubadub_address
New maintainers
LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrubadub_address-2.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b2c8b763902253a10815f6e66b56e28c268998fb8edf137757293d351f9e690 |
|
MD5 | 458dba4ba46a07711f1e4195583a706f |
|
BLAKE2b-256 | d973fcb424fbe3b53245f61c05df60290444ec482e1e9bb6ed8b6ae8067a7c25 |