Skip to main content

A simple text deidentification tool, built on spacy's state-of-the-art named entity recognition pipeline, now supporting 22 languages.

Project description

pydeidentify

An simple tool for text deidentification, built on spacy's state-of-the-art named entity recognition pipeline

I created this with absolute simplicity in mind, get started deidentifying with a single pip command and 3 lines of python!

Usage

View more detailed examples at https://github.com/Lucasc-99/pydeidentify

DISCLAIMER: this tool is not 100% accurate, and may miss some entities

The model is also case sensitive, and will have decreased accuracy if text is all lower-case

# Basic usage, see examples/long_example.py for more

from pydeidentify import Deidentifier, DeidentifiedText

# Deidentify using this Deidentifier class
d = Deidentifier()
d_text: DeidentifiedText = d.deidentify(
    """My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999."""
)

# View output of deidentification using DeidentifiedText class

print(d_text.original()) # My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999.

print(d_text) # My name is PERSON0, I'm from GPE0, GPE1 and I like to create python packages. I was born DATE0.

print(d_text.encode_mapping) # {'Joe Biden': 'PERSON0', 'Scranton': 'GPE0', 'Pennsylvania': 'GPE1', '12-1-1999': 'DATE0'}
print(d_text.decode_mapping) # {'PERSON0': 'Joe Biden', 'GPE0': 'Scranton', 'GPE1': 'Pennsylvania', 'DATE0': '12-1-1999'}
print(d_text.counts) # {'ORG': 0, 'LOC': 0, 'PERSON': 1, 'GPE': 2, 'DATE': 1, 'FAC': 0}

# Use any spacy model that supports named entity recognition by passing it's name in the spacy_model parameter
# The line below loads the chinese version of the default english model: 'en_core_web_trf'
# see https://spacy.io/models for all models
d_chinese = Deidentifier(spacy_model="zh_core_web_trf") 

See all available langauges and pipelines at https://spacy.io/models

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydeidentify-0.2.2.tar.gz (4.1 kB view hashes)

Uploaded Source

Built Distribution

pydeidentify-0.2.2-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page