Pteredactyl performs free-text redaction and masking of peronally identifiable information (PII) in clinical free text. It can be deployed as an API from a container or as a python module
Project description
Pteredactyl
Pteredactyl utilizes advanced natural language processing techniques to identify and anonymize clinical personally identifiable information (cPII) in clinical free text. It is built on top of Microsoft's Presidio and allows interchange of various transformer models from Huggingface
Features
- Anonymization of various entities such as names, locations, and phone numbers
- Support for processing both strings and pandas DataFrames
- Text highlighting for easy identification of anonymized parts
- Webapp with Gradio
- cPII benchmarking test: Clinical_PII_Redaction_Test
- Production API deployed using Docker and Gradio
- Hide in plain site replacement or masking option
Installation
Can be installed using pip from PyPi:
pip install pteredactyl
Guides
Contributions
Interested in contributing? Check out the contributing guidelines.
Please note that this project follows the Github code of conduct. By contributing to this project, you agree to abide by its terms.
License
Pteredactyl was created at University Hospital Southampton NHSFT by the Research Data Science Team. It is licensed under the terms of the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pteredactyl-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a59fa195a487945c61c655fe53c12fc504c03be22835f9d800c6b124b6070e6 |
|
MD5 | 388993c0983ff9025e413c97f0c030a5 |
|
BLAKE2b-256 | b830c1109f0763475bdeb2847658d574f95fc0326f3debe7dbd90a2a8373fb35 |