Skip to main content

Rule-based de-identification for Belgian clinical text

Project description

black

Belgian Deduce

belgian_deduce is a rule-based de-identification package for Belgian clinical text. It ships as a standalone Python package, uses Belgian lookup data and defaults, and is built on top of docdeid.

  • Remove names, places, institutions, dates, ages, identifiers, phone numbers, e-mail addresses, and URLs from Belgian medical text
  • Tune behavior through config, lookup structures, and custom processors
  • Use Belgian defaults for postal codes, phone numbers, and national register numbers

belgian_deduce started from the original deduce project. This repository now maintains its own package identity, configuration, documentation, and Belgian-specific defaults.

De-identification is never perfect. Validate and adapt the package on your own data before using it in a critical environment.

Citing

If you use belgian_deduce, cite the original DEDUCE paper for the underlying method and reference this repository and version in your implementation notes:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

Install the latest release from PyPI:

pip install belgian-deduce

Getting Started

from belgian_deduce import Deduce

model = Deduce()

text = (
    "betreft: Jan Janssens, rijksregisternummer 85.07.30-033.28, patnr 000334433. "
    "De patient J. Janssens is 64 jaar oud en woont in Leuven. Hij werd op "
    "10 oktober 2018 door arts Peter de Smet ontslagen uit UZ Leuven. "
    "Voor nazorg kan hij worden bereikt via j.janssens.123@gmail.com of "
    "0470 12 34 56."
)

doc = model.deidentify(text)
print(doc.deidentified_text)
betreft: [PERSON-1], rijksregisternummer [NATIONAL_REGISTER_NUMBER-1], patnr [ID-1]. De patient [PERSON-1] is [AGE-1] jaar oud en woont in [LOCATION-1]. Hij werd op [DATE-1] door arts [PERSON-2] ontslagen uit [HOSPITAL-1]. Voor nazorg kan hij worden bereikt via [EMAIL-1] of [PHONE_NUMBER-1].

If patient metadata is known, pass it explicitly:

from belgian_deduce import Deduce, Person

model = Deduce()
patient = Person(first_names=["Jan"], initials="JJ", surname="Janssens")
doc = model.deidentify(text, metadata={"patient": patient})

Metadata can also be used for more than the primary patient. The pipeline supports:

  • persons: one or more additional Person objects treated as regular people
  • addresses: one or more Address objects that should be tagged as location
  • entities: arbitrary exact metadata matches via MetadataEntity
  • Person.birth_date, Person.aliases, and Person.addresses
from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

metadata = {
    "patient": Person(
        first_names=["Jan"],
        surname="Jansen",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Kerkstraat",
                house_number="12A",
                postal_code="9000",
                city="Gent",
            )
        ],
    ),
    "persons": [
        Person(
            first_names=["Peter"],
            surname="de Visser",
            aliases=["Dr. Peter de Visser"],
        )
    ],
    "entities": [
        MetadataEntity(text="UZ Gent", tag="hospital"),
        MetadataEntity(text="ABC-12345", tag="id"),
    ],
}

doc = deduce.deidentify(text, metadata=metadata)

French-speaking notes can be handled through the same API. A practical path is to provide metadata for names, birth dates, addresses, and institutions:

from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

text = (
    "Patient Jean Dupont, né le 12 mars 1980, habite Rue de la Loi 12, "
    "1000 Bruxelles. Sophie Martin consulte à Hôpital Erasme."
)

metadata = {
    "patient": Person(
        first_names=["Jean"],
        surname="Dupont",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Rue de la Loi",
                house_number="12",
                postal_code="1000",
                city="Bruxelles",
            )
        ],
    ),
    "persons": [Person(first_names=["Sophie"], surname="Martin")],
    "entities": [MetadataEntity(text="Hôpital Erasme", tag="hospital")],
}

doc = deduce.deidentify(text, metadata=metadata)
print(doc.deidentified_text)
Patient [PATIENT], né le [DATE-1], habite [LOCATION-1]. [PERSON-1] consulte à [HOSPITAL-1].

Documentation

The project documentation lives in docs/source/tutorial.md and docs/source/migrating.md.

Contributing

Contribution guidance is available in CONTRIBUTING.md.

Versions

  • 4.1.0 - Improved francophone coverage in Wallonia and Brussels, especially for healthcare institutions and locations
  • 4.0.1 - Polished the published package page and stabilized the tag-driven release workflow
  • 4.0.0 - First standalone belgian_deduce release with Belgian defaults and independent docs/tooling
  • Earlier entries in CHANGELOG.md predate the standalone release and are preserved for provenance

Authors

  • Vincent Menger - original DEDUCE implementation
  • Stig Hellemans - Belgian standalone package and maintenance

License

This project remains licensed under LGPL-3.0-or-later. See LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

belgian_deduce-4.1.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

belgian_deduce-4.1.0-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file belgian_deduce-4.1.0.tar.gz.

File metadata

  • Download URL: belgian_deduce-4.1.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.1.0.tar.gz
Algorithm Hash digest
SHA256 04021fbb7b3f49e07b33793b920a7a420ad65f7743d33f2157e7f7c1319ec846
MD5 00db6b5a4adb2eb2b6723a4fc36fb731
BLAKE2b-256 7ff0c65b21ac193d521d4c3a2ea991afd41370ef30c0ebff50df8e70b03ac968

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.1.0.tar.gz:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file belgian_deduce-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: belgian_deduce-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a0340e33f6dc97c4d8939eb79adb6cfc919c045cd7ab132bc375233453e6d4f
MD5 9cd53c8838adcb202a9d1047f6cdf17c
BLAKE2b-256 f1168eebd11c48cf526f31f45d61b25087f940aa066d50ef5dc2f45f1ab55579

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.1.0-py3-none-any.whl:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page