Redact Text with HuggingFace Models
Project description
🤗 Redactions
HuggingFace Redactions (hufr
) redacts personal identifiable information from text using pretrained language models from the HuggingFace model repository. This packge wraps token classification models to streamline the redaction of personal identifiable information from free text. This project is not associated with the official HuggingFace organization, just a fun side project for this individual contributor.
Installation
To install this package, run pip install hufr
Usage
See below for an example snippet to load a specific token classification library from the HuggingFace model zoo:
from hufr.models import TokenClassificationTransformer
from hufr.redact import redact_text
from transformers.tokenization_utils_base import BatchEncoding
model_path = "dslim/bert-base-NER"
model = TokenClassificationTransformer(
model=model_path,
tokenizer=model_path
)
text = "Hello! My name is Rob"
redact_text(
text,
redaction_map={'PER': '<PERSON>'},
model=model
)
> `"Hello! My name is <PERSON>"`
If you don't want to instantiate a model and supply a specific token classification model, then you can simply rely on the repository defaults for a quick and simple redaction:
from hufr.redact import redact_text
text = "Hello! My name is Rob"
redact_text(text)
To get the predicted entity for each word in the original text:
from hufr.redact import redact_text
text = "Hello! My name is Rob"
redact_text(text, return_preds=True)
> "Hello! My name is <PERSON>", ['O', 'O', 'O', 'O', 'PER']
By default, personal identifiable information is predicted by the dslim/bert-base-NER model where entities are mapped to redactions using the following mapping table:
'PER': '<PERSON>',
'MIS': '<OTHER>',
'ORG': '<ORGANIZATION>',
'LOC': '<LOCATION>'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hufr-2.0.1.tar.gz
.
File metadata
- Download URL: hufr-2.0.1.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac4b1a781db5bce0446162ba0bd94cd8cf9a4e54cdcfdd4e5a72260c689372a5 |
|
MD5 | c9e7b523e602b3f25122c8abd770ed7c |
|
BLAKE2b-256 | a615dbd64cac250f4575c069481c98a839ebf83bc271f9140c27e645a0cad477 |
File details
Details for the file hufr-2.0.1-py3-none-any.whl
.
File metadata
- Download URL: hufr-2.0.1-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b90b52a1d14063eb97186ac4522a628a0cafe23283762d7de8fb4fd5cd870936 |
|
MD5 | 37a0c40e67704ee364c6c8abc9c8922c |
|
BLAKE2b-256 | cf855cc65e9777dc6e80261f638d163709455ff4a55a136e2407f3aaa5ec25a4 |