Skip to main content

Deduce: de-identification method for Dutch medical text

Project description

deduce

tests coverage build documentation pypi version pypi python versions pypi downloads license black

Installation - Versions - Getting Started - Documentation - Contributiong - Authors - License

Deduce 2.0.0 has been released! It includes a 10x speedup, and way more features for customizing and tailoring. Some small changes are needed to keep going from version 1, read more about it here: docs/migrating-to-v2

De-identify clinial text written in Dutch using deduce, a rule-based de-identification method for Dutch clinical text.

The development, principles and validation of deduce were initially described in Menger et al. (2017). De-identification of clinical text is needed for using text data for analysis, to comply with legal requirements and to protect the privacy of patients. By default, our rule-based method removes Protected Health Information (PHI) in the following categories:

  • Person names, including initials
  • Geographical locations smaller than a country
  • Names of institutions that are related to patient treatment
  • Dates (combinations of day, month and year)
  • Ages
  • BSN numbers
  • Identifiers (7+ digits without a specific format, e.g. patient identifiers, AGB, BIG)
  • Telephone numbers
  • E-mail addresses
  • URLs

If you use deduce, please cite the following paper:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

pip install deduce

Versions

For most cases the latest version is suitable, but some specific milestones are:

  • 2.0.0 - Major refactor, with speedups, many new options for customizing, functionally very similar to original
  • 1.0.8 - Small bugfixes compared to original release
  • 1.0.1 - Original release with Menger et al. (2017)

Detailed versioning information is accessible in the changelog.

Getting started

The basic way to use deduce, is to pass text to the deidentify method of a Deduce object:

from deduce import Deduce

deduce = Deduce()

text = (
    "betreft: Jan Jansen, bsn 111222333, patnr 000334433. De patient J. Jansen is 64 jaar oud en woonachtig in "
    "Utrecht. Hij werd op 10 oktober 2018 door arts Peter de Visser ontslagen van de kliniek van het UMCU. "
    "Voor nazorg kan hij worden bereikt via j.JNSEN.123@gmail.com of (06)12345678."
)

doc = deduce.deidentify(text)

The output is available in the Document object:

from pprint import pprint

pprint(doc.annotations)

AnnotationSet({
    Annotation(text="(06)12345678", start_char=272, end_char=284, tag="telefoonnummer"),
    Annotation(text="111222333", start_char=25, end_char=34, tag="bsn"),
    Annotation(text="Peter de Visser", start_char=153, end_char=168, tag="persoon"),
    Annotation(text="j.JNSEN.123@gmail.com", start_char=247, end_char=268, tag="email"),
    Annotation(text="patient J. Jansen", start_char=56, end_char=73, tag="patient"),
    Annotation(text="Jan Jansen", start_char=9, end_char=19, tag="patient"),
    Annotation(text="10 oktober 2018", start_char=127, end_char=142, tag="datum"),
    Annotation(text="64", start_char=77, end_char=79, tag="leeftijd"),
    Annotation(text="000334433", start_char=42, end_char=51, tag="id"),
    Annotation(text="Utrecht", start_char=106, end_char=113, tag="locatie"),
    Annotation(text="UMCU", start_char=202, end_char=206, tag="instelling"),
})

print(doc.deidentified_text)

"""betreft: <PERSOON-1>, bsn <BSN-1>, patnr <ID-1>. De <PERSOON-1> is <LEEFTIJD-1> jaar oud en woonachtig in 
<LOCATIE-1>. Hij werd op <DATUM-1> door arts <PERSOON-2> ontslagen van de kliniek van het <INSTELLING-1>. 
Voor nazorg kan hij worden bereikt via <EMAIL-1> of <TELEFOONNUMMER-1>."""

Aditionally, if the names of the patient are known, they may be added as metadata, where they will be picked up by deduce:

from deduce.person import Person

patient = Person(first_names=["Jan"], initials="JJ", surname="Jansen")
doc = deduce.deidentify(text, metadata={'patient': patient})

print (doc.deidentified_text)

"""betreft: <PATIENT>, bsn <BSN-1>, patnr <ID-1>. De <PATIENT> is <LEEFTIJD-1> jaar oud en woonachtig in 
<LOCATIE-1>. Hij werd op <DATUM-1> door arts <PERSOON-2> ontslagen van de kliniek van het <INSTELLING-1>. 
Voor nazorg kan hij worden bereikt via <EMAIL-1> of <TELEFOONNUMMER-1>."""

As you can see, adding known names keeps references to <PATIENT> in text. It also increases recall, as not all known names are contained in the lookup lists.

Documentation

A more extensive tutorial on using, configuring and modifying deduce is available at: docs/tutorial

Basic documentation and API are available at: docs

Contributing

For setting up the dev environment and contributing guidelines, see: docs/contributing

Authors

  • Vincent Menger - Initial work
  • Jonathan de Bruin - Code review
  • Pablo Mosteiro - Bug fixes, structured annotations

License

This project is licensed under the GNU LGPLv3 license - see the LICENSE.md file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deduce-2.4.1.tar.gz (5.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deduce-2.4.1-py3-none-any.whl (5.6 MB view details)

Uploaded Python 3

File details

Details for the file deduce-2.4.1.tar.gz.

File metadata

  • Download URL: deduce-2.4.1.tar.gz
  • Upload date:
  • Size: 5.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.10.13 Linux/6.2.0-1015-azure

File hashes

Hashes for deduce-2.4.1.tar.gz
Algorithm Hash digest
SHA256 667555e40d40c18a24de43e1493b569335232723e3a7440f25844dc8a2fdb011
MD5 e6ab38cb66d70a8de884e570274d3a07
BLAKE2b-256 c041774e2d8eb2d93aa24c3879069ed365e387ff58064cee9ad0c4b815c03fc2

See more details on using hashes here.

File details

Details for the file deduce-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: deduce-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 5.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.10.13 Linux/6.2.0-1015-azure

File hashes

Hashes for deduce-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 279a5f44d1e4deb13426d49abe7e5ae42614330dc781431fab428020c21b624b
MD5 319123d65074a8058e031f1d050baa9d
BLAKE2b-256 afb5c908b9088708a65a0a20a7c2dbd26b45a6bef93ece627491c55dcf08e867

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page