A simple text deidentification tool, built on spacy's state-of-the-art named entity recognition pipeline, now supporting 22 languages.
Project description
pydeidentify
An simple tool for text deidentification, built on spacy's state-of-the-art named entity recognition pipeline
I created this with absolute simplicity in mind, get started deidentifying with a single pip command and 3 lines of python!
Usage
View more detailed examples at https://github.com/Lucasc-99/pydeidentify
DISCLAIMER: this tool is not 100% accurate, and may miss some entities
The model is also case sensitive, and will have decreased accuracy if text is all lower-case
# Basic usage, see examples/long_example.py for more
from pydeidentify import Deidentifier, DeidentifiedText
# Deidentify using this Deidentifier class
d = Deidentifier()
d_text: DeidentifiedText = d.deidentify(
"""My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999."""
)
# View output of deidentification using DeidentifiedText class
print(d_text.original()) # My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999.
print(d_text) # My name is PERSON0, I'm from GPE0, GPE1 and I like to create python packages. I was born DATE0.
print(d_text.encode_mapping) # {'Joe Biden': 'PERSON0', 'Scranton': 'GPE0', 'Pennsylvania': 'GPE1', '12-1-1999': 'DATE0'}
print(d_text.decode_mapping) # {'PERSON0': 'Joe Biden', 'GPE0': 'Scranton', 'GPE1': 'Pennsylvania', 'DATE0': '12-1-1999'}
print(d_text.counts) # {'ORG': 0, 'LOC': 0, 'PERSON': 1, 'GPE': 2, 'DATE': 1, 'FAC': 0}
# Use any spacy model that supports named entity recognition by passing it's name in the spacy_model parameter
# The line below loads the chinese version of the default english model: 'en_core_web_trf'
# see https://spacy.io/models for all models
d_chinese = Deidentifier(spacy_model="zh_core_web_trf")
See all available langauges and pipelines at https://spacy.io/models
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pydeidentify-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ef5d05e4ae5f5826ff444eb3a20131fb39a51a8acd266fe4f620a5cf9b4988b |
|
MD5 | 6b7bd6fb75c4700baf935d42cca1f7cf |
|
BLAKE2b-256 | 5732368ec25c9837da324dbf873fbf6c17e483d9204c4c3b8b2db71701d5e56e |