Rule-based de-identification for Belgian clinical text
Project description
Belgian Deduce
belgian_deduce is a rule-based de-identification package for Belgian clinical text.
It ships as a standalone Python package, uses Belgian lookup data and defaults, and is
built on top of docdeid.
- Remove names, places, institutions, dates, ages, identifiers, phone numbers, e-mail addresses, and URLs from Belgian medical text
- Tune behavior through config, lookup structures, and custom processors
- Use Belgian defaults for postal codes, phone numbers, and national register numbers
belgian_deducestarted from the original deduce project. This repository now maintains its own package identity, configuration, documentation, and Belgian-specific defaults.
De-identification is never perfect. Validate and adapt the package on your own data before using it in a critical environment.
Citing
If you use belgian_deduce, cite the original DEDUCE paper for the underlying method
and reference this repository and version in your implementation notes:
Installation
Install the latest release from PyPI:
pip install belgian-deduce
Getting Started
from belgian_deduce import Deduce
model = Deduce()
text = (
"betreft: Jan Janssens, rijksregisternummer 85.07.30-033.28, patnr 000334433. "
"De patient J. Janssens is 64 jaar oud en woont in Leuven. Hij werd op "
"10 oktober 2018 door arts Peter de Smet ontslagen uit UZ Leuven. "
"Voor nazorg kan hij worden bereikt via j.janssens.123@gmail.com of "
"0470 12 34 56."
)
doc = model.deidentify(text)
print(doc.deidentified_text)
betreft: [PERSON-1], rijksregisternummer [NATIONAL_REGISTER_NUMBER-1], patnr [ID-1]. De patient [PERSON-1] is [AGE-1] jaar oud en woont in [LOCATION-1]. Hij werd op [DATE-1] door arts [PERSON-2] ontslagen uit [HOSPITAL-1]. Voor nazorg kan hij worden bereikt via [EMAIL-1] of [PHONE_NUMBER-1].
If patient metadata is known, pass it explicitly:
from belgian_deduce import Deduce, Person
model = Deduce()
patient = Person(first_names=["Jan"], initials="JJ", surname="Janssens")
doc = model.deidentify(text, metadata={"patient": patient})
Metadata can also be used for more than the primary patient. The pipeline supports:
persons: one or more additionalPersonobjects treated as regular peopleaddresses: one or moreAddressobjects that should be tagged aslocationentities: arbitrary exact metadata matches viaMetadataEntityPerson.birth_date,Person.aliases, andPerson.addresses
from datetime import date
from belgian_deduce import Address, Deduce, MetadataEntity, Person
deduce = Deduce()
metadata = {
"patient": Person(
first_names=["Jan"],
surname="Jansen",
birth_date=date(1980, 3, 12),
addresses=[
Address(
street="Kerkstraat",
house_number="12A",
postal_code="9000",
city="Gent",
)
],
),
"persons": [
Person(
first_names=["Peter"],
surname="de Visser",
aliases=["Dr. Peter de Visser"],
)
],
"entities": [
MetadataEntity(text="UZ Gent", tag="hospital"),
MetadataEntity(text="ABC-12345", tag="id"),
],
}
doc = deduce.deidentify(text, metadata=metadata)
French-speaking notes can be handled through the same API. A practical path is to provide metadata for names, birth dates, addresses, and institutions:
from datetime import date
from belgian_deduce import Address, Deduce, MetadataEntity, Person
deduce = Deduce()
text = (
"Patient Jean Dupont, né le 12 mars 1980, habite Rue de la Loi 12, "
"1000 Bruxelles. Sophie Martin consulte à Hôpital Erasme."
)
metadata = {
"patient": Person(
first_names=["Jean"],
surname="Dupont",
birth_date=date(1980, 3, 12),
addresses=[
Address(
street="Rue de la Loi",
house_number="12",
postal_code="1000",
city="Bruxelles",
)
],
),
"persons": [Person(first_names=["Sophie"], surname="Martin")],
"entities": [MetadataEntity(text="Hôpital Erasme", tag="hospital")],
}
doc = deduce.deidentify(text, metadata=metadata)
print(doc.deidentified_text)
Patient [PATIENT], né le [DATE-1], habite [LOCATION-1]. [PERSON-1] consulte à [HOSPITAL-1].
Documentation
The project documentation lives in docs/source/tutorial.md and docs/source/migrating.md.
Contributing
Contribution guidance is available in CONTRIBUTING.md.
Versions
4.1.0- Improved francophone coverage in Wallonia and Brussels, especially for healthcare institutions and locations4.0.1- Polished the published package page and stabilized the tag-driven release workflow4.0.0- First standalonebelgian_deducerelease with Belgian defaults and independent docs/tooling- Earlier entries in CHANGELOG.md predate the standalone release and are preserved for provenance
Authors
- Vincent Menger - original DEDUCE implementation
- Stig Hellemans - Belgian standalone package and maintenance
License
This project remains licensed under LGPL-3.0-or-later. See LICENSE.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file belgian_deduce-4.1.0.tar.gz.
File metadata
- Download URL: belgian_deduce-4.1.0.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04021fbb7b3f49e07b33793b920a7a420ad65f7743d33f2157e7f7c1319ec846
|
|
| MD5 |
00db6b5a4adb2eb2b6723a4fc36fb731
|
|
| BLAKE2b-256 |
7ff0c65b21ac193d521d4c3a2ea991afd41370ef30c0ebff50df8e70b03ac968
|
Provenance
The following attestation bundles were made for belgian_deduce-4.1.0.tar.gz:
Publisher:
build.yml on stighellemans/belgian-deduce
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
belgian_deduce-4.1.0.tar.gz -
Subject digest:
04021fbb7b3f49e07b33793b920a7a420ad65f7743d33f2157e7f7c1319ec846 - Sigstore transparency entry: 1357090236
- Sigstore integration time:
-
Permalink:
stighellemans/belgian-deduce@2da2e4412ee765244be0152809bcec163051fec8 -
Branch / Tag:
refs/tags/v4.1.0 - Owner: https://github.com/stighellemans
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@2da2e4412ee765244be0152809bcec163051fec8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file belgian_deduce-4.1.0-py3-none-any.whl.
File metadata
- Download URL: belgian_deduce-4.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a0340e33f6dc97c4d8939eb79adb6cfc919c045cd7ab132bc375233453e6d4f
|
|
| MD5 |
9cd53c8838adcb202a9d1047f6cdf17c
|
|
| BLAKE2b-256 |
f1168eebd11c48cf526f31f45d61b25087f940aa066d50ef5dc2f45f1ab55579
|
Provenance
The following attestation bundles were made for belgian_deduce-4.1.0-py3-none-any.whl:
Publisher:
build.yml on stighellemans/belgian-deduce
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
belgian_deduce-4.1.0-py3-none-any.whl -
Subject digest:
9a0340e33f6dc97c4d8939eb79adb6cfc919c045cd7ab132bc375233453e6d4f - Sigstore transparency entry: 1357090252
- Sigstore integration time:
-
Permalink:
stighellemans/belgian-deduce@2da2e4412ee765244be0152809bcec163051fec8 -
Branch / Tag:
refs/tags/v4.1.0 - Owner: https://github.com/stighellemans
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@2da2e4412ee765244be0152809bcec163051fec8 -
Trigger Event:
push
-
Statement type: