Skip to main content

Pteredactyl performs free-text redaction and masking of peronally identifiable information (PII) in clinical free text. It can be deployed as an API from a container or as a python module

Project description

Pteredactyl

Pteredactyl utilizes advanced natural language processing techniques to identify and anonymize clinical personally identifiable information (cPII) in clinical free text. It is built on top of Microsoft's Presidio and allows interchange of various transformer models from Huggingface

Features

  • Anonymization of various entities such as names, locations, and phone numbers
  • Support for processing both strings and pandas DataFrames
  • Text highlighting for easy identification of anonymized parts
  • Webapp with Gradio
  • cPII benchmarking test: Clinical_PII_Redaction_Test
  • Production API deployed using Docker and Gradio
  • Hide in plain site replacement or masking option

Installation

Can be installed using pip from PyPi:

pip install pteredactyl

Guides

Contributions

Interested in contributing? Check out the contributing guidelines.

Please note that this project follows the Github code of conduct. By contributing to this project, you agree to abide by its terms.

License

Pteredactyl was created at University Hospital Southampton NHSFT by the Research Data Science Team. It is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pteredactyl-1.0.2.tar.gz (25.8 kB view hashes)

Uploaded Source

Built Distribution

pteredactyl-1.0.2-py3-none-any.whl (30.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page