Deduce: de-identification method for Dutch medical text

These details have not been verified by PyPI

Project links

Project description

pypi version pypi python versions pypi downloads license

deduce

Deduce 3.0.0 is out! It is way more accurate, and faster too. It's fully backward compatible, but some functionality is scheduled for removal, read more about it here: docs/migrating-to-v3

:sparkles: Remove sensitive information from clinical text written in Dutch
:mag: Rule based logic for detecting e.g. names, locations, institutions, identifiers, phone numbers
:triangular_ruler: Useful out of the box, but customization higly recommended
:seedling: Originally validated in Menger et al. (2017), but further optimized since

:exclamation: Deduce is useful out of the box, but please validate and customize on your own data before using it in a critical environment. Remember that de-identification is almost never perfect, and that clinical text often contains other specific details that can link it to a specific person. Be aware that de-identification should primarily be viewed as a way to mitigate risk of identification, rather than a way to obtain anonymous data.

Currently, deduce can remove the following types of Protected Health Information (PHI):

:bust_in_silhouette: person names, including prefixes and initials
:earth_americas: geographical locations smaller than a country
:hospital: names of hospitals and healthcare institutions
:calendar: dates (combinations of day, month and year)
:birthday: ages
:1234: BSN numbers
:1234: identifiers (7+ digits without a specific format, e.g. patient identifiers, AGB, BIG)
:phone: phone numbers
:e-mail: e-mail addresses
:link: URLs

Citing

If you use deduce, please cite the following paper:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

pip install deduce

Getting started

The basic way to use deduce, is to pass text to the deidentify method of a Deduce object:

from deduce import Deduce

deduce = Deduce()

text = (
    "betreft: Jan Jansen, bsn 111222333, patnr 000334433. De patient J. Jansen is 64 jaar oud en woonachtig in "
    "Utrecht. Hij werd op 10 oktober 2018 door arts Peter de Visser ontslagen van de kliniek van het UMCU. "
    "Voor nazorg kan hij worden bereikt via j.JNSEN.123@gmail.com of (06)12345678."
)

doc = deduce.deidentify(text)

The output is available in the Document object:

from pprint import pprint

pprint(doc.annotations)

AnnotationSet({
    Annotation(text="(06)12345678", start_char=272, end_char=284, tag="telefoonnummer"),
    Annotation(text="111222333", start_char=25, end_char=34, tag="bsn"),
    Annotation(text="Peter de Visser", start_char=153, end_char=168, tag="persoon"),
    Annotation(text="j.JNSEN.123@gmail.com", start_char=247, end_char=268, tag="email"),
    Annotation(text="patient J. Jansen", start_char=56, end_char=73, tag="patient"),
    Annotation(text="Jan Jansen", start_char=9, end_char=19, tag="patient"),
    Annotation(text="10 oktober 2018", start_char=127, end_char=142, tag="datum"),
    Annotation(text="64", start_char=77, end_char=79, tag="leeftijd"),
    Annotation(text="000334433", start_char=42, end_char=51, tag="id"),
    Annotation(text="Utrecht", start_char=106, end_char=113, tag="locatie"),
    Annotation(text="UMCU", start_char=202, end_char=206, tag="instelling"),
})

print(doc.deidentified_text)

"""betreft: [PERSOON-1], bsn [BSN-1], patnr [ID-1]. De [PERSOON-1] is [LEEFTIJD-1] jaar oud en woonachtig in 
[LOCATIE-1]. Hij werd op [DATUM-1] door arts [PERSOON-2] ontslagen van de kliniek van het [INSTELLING-1]. 
Voor nazorg kan hij worden bereikt via [EMAIL-1] of [TELEFOONNUMMER-1]."""

Additionally, if the names of the patient are known, they may be added as metadata, where they will be picked up by deduce:

from deduce.person import Person

patient = Person(first_names=["Jan"], initials="JJ", surname="Jansen")
doc = deduce.deidentify(text, metadata={'patient': patient})

print (doc.deidentified_text)

"""betreft: [PATIENT], bsn [BSN-1], patnr [ID-1]. De [PATIENT] is [LEEFTIJD-1] jaar oud en woonachtig in 
[LOCATIE-1]. Hij werd op [DATUM-1] door arts [PERSOON-2] ontslagen van de kliniek van het [INSTELLING-1]. 
Voor nazorg kan hij worden bereikt via [EMAIL-1] of [TELEFOONNUMMER-1]."""

As you can see, adding known names keeps references to [PATIENT] in text. It also increases recall, as not all known names are contained in the lookup lists.

Versions

For most cases the latest version is suitable, but some specific milestones are:

3.0.0 - Many optimizations in accuracy, smaller refactors, further speedups
2.0.0 - Major refactor, with speedups, many new options for customizing, functionally very similar to original
1.0.8 - Small bugfixes compared to original release
1.0.1 - Original release with Menger et al. (2017)

Detailed versioning information is accessible in the changelog.

Documentation

All documentation, including a more extensive tutorial on using, configuring and modifying deduce, and its API, is available at: docs/tutorial

Contributing

For setting up the dev environment and contributing guidelines, see: docs/contributing

Authors

Vincent Menger - Initial work
Jonathan de Bruin - Code review
Pablo Mosteiro - Bug fixes, structured annotations

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.0.6

Aug 11, 2025

3.0.5

Jun 18, 2025

3.0.4

May 6, 2025

3.0.3

Jul 16, 2024

3.0.2

Feb 15, 2024

3.0.1

Dec 20, 2023

3.0.0 yanked

Dec 20, 2023

Reason this release was yanked:

config bug

2.5.0

Nov 28, 2023

2.4.4

Nov 22, 2023

2.4.3

Nov 22, 2023

2.4.2

Nov 21, 2023

2.4.1

Nov 15, 2023

2.4.0

Nov 15, 2023

2.3.1

Nov 1, 2023

2.3.0

Oct 25, 2023

2.2.0

Sep 28, 2023

2.1.0

Aug 7, 2023

2.0.3

Apr 6, 2023

2.0.2

Mar 28, 2023

2.0.1

Dec 9, 2022

2.0.0

Dec 5, 2022

1.0.8

Dec 23, 2021

1.0.7

Nov 3, 2021

1.0.6

Oct 6, 2021

1.0.5 yanked

Oct 5, 2021

Reason this release was yanked:

Contains bug

1.0.4 yanked

Oct 5, 2021

Reason this release was yanked:

Contains bug

1.0.3

Jul 7, 2021

1.0.2

Jan 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deduce-3.0.6.tar.gz (1.9 MB view details)

Uploaded Aug 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deduce-3.0.6-py3-none-any.whl (1.9 MB view details)

Uploaded Aug 11, 2025 Python 3

File details

Details for the file deduce-3.0.6.tar.gz.

File metadata

Download URL: deduce-3.0.6.tar.gz
Upload date: Aug 11, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for deduce-3.0.6.tar.gz
Algorithm	Hash digest
SHA256	`6aefc5750732f11b9a37078f600a93fbcc812c63e6e7b22a2b378ac85c08b810`
MD5	`6ff1332faee1cb84ca9449c9f195f05a`
BLAKE2b-256	`d9eac419c08b4497d6d82c6f17dcc1854de17ef3ff28cc9103fb09897d1459e2`

See more details on using hashes here.

File details

Details for the file deduce-3.0.6-py3-none-any.whl.

File metadata

Download URL: deduce-3.0.6-py3-none-any.whl
Upload date: Aug 11, 2025
Size: 1.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for deduce-3.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f927455e4a9e88d4af815cf1abb78f652db0a9899539d982c38503c30ea3c4c`
MD5	`7ad1ff61524131e077c7ec8b7d54b5ce`
BLAKE2b-256	`3018a27bf50502d83f8891f283eb7eb2b7a62d6b7c67aafa5e85bca6b45f25c8`

See more details on using hashes here.

deduce 3.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

deduce

Citing

Installation

Getting started

Versions

Documentation

Contributing

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes