Skip to main content

Use the Stanford NER model to clean personally identifiable information from dirty dirty text.

Project description

scrubadub removes personally identifiable information from text. scrubadub_stanford is an extension that uses Stanford’s NER model to remove personal information from text.

This package contains three flavours of interfacing with Stanford’s NER models that can be used as a detector:

  • scrubadub_stanford.detectors.StanfordEntityDetector - A detector that uses the Stanford NER model to find locations, names and organizations. Download size circa 250MB.

  • scrubadub_stanford.detectors.CoreNlpEntityDetector - The same interface as the StanfordEntityDetector, but using Stanza’s CoreNLPClient to interface with the CoreNLP Java Server. Download size circa 510MB.

  • scrubadub_stanford.detectors.StanzaEntityDetector - Similar to the above but using Stanza’s native Python pipelines. Download size circa 210MB. No Java required. This is the recommended detector for speed and footprint.

Prerequisites

A minimum version of Java Runtime Environment 8 is required for StanfordEntityDetector and CoreNlpEntityDetector. Check which version by running:

$ java -version

It should be at least version 1.8, but if not, please run the following commands:

Linux:

$ sudo apt update
$ sudo apt install openjdk-8-jre

MacOS:

$ brew tap adoptopenjdk/openjdk
$ brew install adoptopenjdk8-jre

For more information on how to use this package see the scrubadub stanford documentation and the scrubadub repository.

Build Status Version Downloads Test Coverage Documentation Status

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrubadub_stanford-2.1.3.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

scrubadub_stanford-2.1.3-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file scrubadub_stanford-2.1.3.tar.gz.

File metadata

  • Download URL: scrubadub_stanford-2.1.3.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for scrubadub_stanford-2.1.3.tar.gz
Algorithm Hash digest
SHA256 a3bebf8bbd3f35d26e173492ed0bf2a4778bf5423742f93dea365ff52bd0d8db
MD5 0f38a579e5bd49411abc43cf15f1cef0
BLAKE2b-256 105809c0be43f078e5d50f90b2da11aec4207f1cb79342d7d929692eb9ec0f68

See more details on using hashes here.

File details

Details for the file scrubadub_stanford-2.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for scrubadub_stanford-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3811647f9bc015ea91057306f8a76b4536bbc532547daf6246db734f55d7e8ad
MD5 4512b693e6ab0aa84fbfa5e65aed9174
BLAKE2b-256 c7d2deb7cf69a046e67ac3981ff62d9ac95fae47c50499f15716c870a66be0aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page