Skip to main content

Remove identifiers from data using BERT

Project description

bert-deid

Code to fine-tune BERT on a medical note de-identification task.

Install

  • (Recommended) Create an environment called deid
    • conda env create -f environment.yml
  • pip install locally
    • pip install bert-deid

Download

To download the model, we have provided a helper script in bert-deid:

# note: MODEL_DIR environment variable used by download
# by default, we download to bert_deid_model in the current directory
export MODEL_DIR="bert_deid_model"
bert_deid download

Usage

The model can be imported and used directly within Python.

from bert_deid.model import Transformer

# load in a trained model
model_path = 'bert_deid_model'
deid_model = Transformer(model_path)

with open('tests/example_note.txt', 'r') as fp:
    text = ''.join(fp.readlines())

print(deid_model.apply(text, repl='___'))

# we can also get the original predictions
preds = deid_model.predict(text)

# print out the identified entities
for p, pred in enumerate(preds):
    prob = pred[0]
    label = pred[1]
    start, stop = pred[2:]

    # print the prediction labels out
    print(f'{text[start:stop]:15s} {label} ({prob:0.3f})')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert_deid-0.2.3.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

bert_deid-0.2.3-py3-none-any.whl (40.2 kB view details)

Uploaded Python 3

File details

Details for the file bert_deid-0.2.3.tar.gz.

File metadata

  • Download URL: bert_deid-0.2.3.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.55.2 CPython/3.7.9

File hashes

Hashes for bert_deid-0.2.3.tar.gz
Algorithm Hash digest
SHA256 2e60fd8997f3dad81e1e50efe632e265a850fef0e19419cbaa4fff77d5ae7dc4
MD5 88724b4fa030917a98004e08f87e71e8
BLAKE2b-256 34a1cef86527dbed07c88d7c2cdb0af8e398693a7f0a2e9edd9b918bad125e29

See more details on using hashes here.

File details

Details for the file bert_deid-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: bert_deid-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 40.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.55.2 CPython/3.7.9

File hashes

Hashes for bert_deid-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b3d7c1d9ef5d22509d8886b6c597afcf5c8c0f6d51e89ccf71a2c451454f4f9
MD5 64407743bed3b07d17030bd12aaea95e
BLAKE2b-256 cc372c2ce03a91887d6e418865ea15ca42b84b436030413df55dc3596c76435f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page