Skip to main content

A simple text deidentification tool, built on spacy's state-of-the-art named entity recognition pipeline, now supporting 22 languages.

Project description

pydeidentify

An simple tool for text deidentification, built on spacy's state-of-the-art named entity recognition pipeline

I created this with absolute simplicity in mind, get started deidentifying with a single pip command and 3 lines of python!

Usage

View more detailed examples at https://github.com/Lucasc-99/pydeidentify

DISCLAIMER: this tool is not 100% accurate, and may miss some entities

The model is also case sensitive, and will have decreased accuracy if text is all lower-case

# Basic usage, see examples/long_example.py for more

from pydeidentify import Deidentifier, DeidentifiedText

# Deidentify using this Deidentifier class
d = Deidentifier()
d_text: DeidentifiedText = d.deidentify(
    """My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999."""
)

# View output of deidentification using DeidentifiedText class

print(d_text.original()) # My name is Joe Biden, I'm from Scranton, Pennsylvania and I like to create python packages. I was born 12-1-1999.

print(d_text) # My name is PERSON0, I'm from GPE0, GPE1 and I like to create python packages. I was born DATE0.

print(d_text.encode_mapping) # {'Joe Biden': 'PERSON0', 'Scranton': 'GPE0', 'Pennsylvania': 'GPE1', '12-1-1999': 'DATE0'}
print(d_text.decode_mapping) # {'PERSON0': 'Joe Biden', 'GPE0': 'Scranton', 'GPE1': 'Pennsylvania', 'DATE0': '12-1-1999'}
print(d_text.counts) # {'ORG': 0, 'LOC': 0, 'PERSON': 1, 'GPE': 2, 'DATE': 1, 'FAC': 0}

# Use any spacy model that supports named entity recognition by passing it's name in the spacy_model parameter
# The line below loads the chinese version of the default english model: 'en_core_web_trf'
# see https://spacy.io/models for all models
d_chinese = Deidentifier(spacy_model="zh_core_web_trf") 

See all available langauges and pipelines at https://spacy.io/models

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydeidentify-0.2.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydeidentify-0.2.2-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pydeidentify-0.2.2.tar.gz.

File metadata

  • Download URL: pydeidentify-0.2.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for pydeidentify-0.2.2.tar.gz
Algorithm Hash digest
SHA256 409e9b9b7024d3775c3224215863dd0bfdfce973bb2e169e6dd4f16b467f4b90
MD5 76edadd68cd459cf7f5e94998d7dcade
BLAKE2b-256 dbb37512df8cc286ef4c7a567ab03e05dea2680c747f051c3f06c6f9c7356799

See more details on using hashes here.

File details

Details for the file pydeidentify-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: pydeidentify-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.16.3-microsoft-standard-WSL2

File hashes

Hashes for pydeidentify-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ef5d05e4ae5f5826ff444eb3a20131fb39a51a8acd266fe4f620a5cf9b4988b
MD5 6b7bd6fb75c4700baf935d42cca1f7cf
BLAKE2b-256 5732368ec25c9837da324dbf873fbf6c17e483d9204c4c3b8b2db71701d5e56e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page