A simple text deidentification tool, built on huggingface's named entity recognition pipeline
Project description
pydeidentify
A Python library for easy text deidentification
Usage
View more detailed examples at https://github.com/Lucasc-99/pydeidentify
from pydeidentify import Deidentifier, DeidentifiedText
# Deidentify using this Deidentifier class
d = Deidentifier()
d_text: DeidentifiedText = d.deidentify("My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages")
# View output of deidentification using DeidentifiedText class
print(d_text.original()) # My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages
print(d_text) # My name is PER0, I'm from LOC0, LOC1 and I like to create python packages
print(d_text.encode_mapping) # {'Joe Biden': 'PER0', 'Scranton': 'LOC0', 'Pennsylvania': 'LOC1'}
print(d_text.decode_mapping) # {'PER0': 'Joe Biden', 'LOC0': 'Scranton', 'LOC1': 'Pennsylvania'}
print(d_text.counts) # {'PER': 1, 'ORG': 0, 'LOC': 2, 'MISC': 0}
Contributing
All pull requests are welcome.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pydeidentify-0.1.1.tar.gz
(3.0 kB
view hashes)
Built Distribution
Close
Hashes for pydeidentify-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c61f285a37677cb64c9b5778be9ebbca8200bf63c31aa1bd58f20e24f9bbea3 |
|
MD5 | e8ad30c894765a0e6d8945d7f9f0ad45 |
|
BLAKE2b-256 | 839325e956c754d565cf32bd33823e2d5849f1af7f28a1acb33bf0cb9d5d980c |