Skip to main content

Rule-based de-identification for Belgian clinical text

Project description

black

Belgian Deduce

belgian_deduce is a rule-based de-identification package for Belgian clinical text. It ships as a standalone Python package, uses Belgian lookup data and defaults, and is built on top of docdeid.

  • Remove names, places, institutions, dates, ages, identifiers, phone numbers, e-mail addresses, and URLs from Belgian medical text
  • Tune behavior through config, lookup structures, and custom processors
  • Use Belgian defaults for postal codes, phone numbers, and national register numbers

belgian_deduce started from the original deduce project. This repository now maintains its own package identity, configuration, documentation, and Belgian-specific defaults.

De-identification is never perfect. Validate and adapt the package on your own data before using it in a critical environment.

Citing

If you use belgian_deduce, cite the original DEDUCE paper for the underlying method and reference this repository and version in your implementation notes:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

Install the latest release from PyPI:

pip install belgian-deduce

For unreleased changes, install directly from GitHub:

pip install "git+https://github.com/stighellemans/belgian-deduce.git"

For reproducible environments, pin to a commit:

pip install "git+https://github.com/stighellemans/belgian-deduce.git@<commit-sha>"

Getting Started

from belgian_deduce import Deduce

model = Deduce()

text = (
    "betreft: Jan Janssens, rijksregisternummer 85.07.30-033.28, patnr 000334433. "
    "De patient J. Janssens is 64 jaar oud en woont in Leuven. Hij werd op "
    "10 oktober 2018 door arts Peter de Smet ontslagen uit UZ Leuven. "
    "Voor nazorg kan hij worden bereikt via j.janssens.123@gmail.com of "
    "0470 12 34 56."
)

doc = model.deidentify(text)
print(doc.deidentified_text)
betreft: [PERSON-1], rijksregisternummer [NATIONAL_REGISTER_NUMBER-1], patnr [ID-1]. De patient [PERSON-1] is [AGE-1] jaar oud en woont in [LOCATION-1]. Hij werd op [DATE-1] door arts [PERSON-2] ontslagen uit [HOSPITAL-1]. Voor nazorg kan hij worden bereikt via [EMAIL-1] of [PHONE_NUMBER-1].

If patient metadata is known, pass it explicitly:

from belgian_deduce import Deduce, Person

model = Deduce()
patient = Person(first_names=["Jan"], initials="JJ", surname="Janssens")
doc = model.deidentify(text, metadata={"patient": patient})

Metadata can also be used for more than the primary patient. The pipeline supports:

  • persons: one or more additional Person objects treated as regular people
  • addresses: one or more Address objects that should be tagged as location
  • entities: arbitrary exact metadata matches via MetadataEntity
  • Person.birth_date, Person.aliases, and Person.addresses
from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

metadata = {
    "patient": Person(
        first_names=["Jan"],
        surname="Jansen",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Kerkstraat",
                house_number="12A",
                postal_code="9000",
                city="Gent",
            )
        ],
    ),
    "persons": [
        Person(
            first_names=["Peter"],
            surname="de Visser",
            aliases=["Dr. Peter de Visser"],
        )
    ],
    "entities": [
        MetadataEntity(text="UZ Gent", tag="hospital"),
        MetadataEntity(text="ABC-12345", tag="id"),
    ],
}

doc = deduce.deidentify(text, metadata=metadata)

French-speaking notes can be handled through the same API. A practical path is to provide metadata for names, birth dates, addresses, and institutions:

from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

text = (
    "Patient Jean Dupont, né le 12 mars 1980, habite Rue de la Loi 12, "
    "1000 Bruxelles. Sophie Martin consulte à Hôpital Erasme."
)

metadata = {
    "patient": Person(
        first_names=["Jean"],
        surname="Dupont",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Rue de la Loi",
                house_number="12",
                postal_code="1000",
                city="Bruxelles",
            )
        ],
    ),
    "persons": [Person(first_names=["Sophie"], surname="Martin")],
    "entities": [MetadataEntity(text="Hôpital Erasme", tag="hospital")],
}

doc = deduce.deidentify(text, metadata=metadata)
print(doc.deidentified_text)
Patient [PATIENT], né le [DATE-1], habite [LOCATION-1]. [PERSON-1] consulte à [HOSPITAL-1].

Documentation

The project documentation lives in docs/source/tutorial.md and docs/source/migrating.md.

Contributing

Contribution guidance is available in CONTRIBUTING.md.

Versions

  • 4.0.0 - First standalone belgian_deduce release with Belgian defaults and independent docs/tooling
  • Earlier entries in CHANGELOG.md predate the standalone release and are preserved for provenance

Authors

  • Vincent Menger - original DEDUCE implementation
  • Stig Hellemans - Belgian standalone package and maintenance

License

This project remains licensed under LGPL-3.0-or-later. See LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

belgian_deduce-4.0.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

belgian_deduce-4.0.0-py3-none-any.whl (8.5 MB view details)

Uploaded Python 3

File details

Details for the file belgian_deduce-4.0.0.tar.gz.

File metadata

  • Download URL: belgian_deduce-4.0.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.0.0.tar.gz
Algorithm Hash digest
SHA256 e3982a36538cc8a7a6523c6a4be44edec1694dc31de6bc40f7ab66b754f0b7a6
MD5 941b811257a1e43cff2458c34e28b045
BLAKE2b-256 833d1ac2efd8b3d67f94a442a149042930f3e9384f368f5b9a4b496cdd8f7b0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.0.0.tar.gz:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file belgian_deduce-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: belgian_deduce-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a95951e3f0eba1e1f396b6f87f87a1f8b81f9c0bb3127dd22bf6655256b1206
MD5 ddddd45e41b7a2aa10cbaeb650e7fa2e
BLAKE2b-256 8dea8c4c218cc5e9520b667692e4106a4990df816e36b80c9c69a7a448fd94fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.0.0-py3-none-any.whl:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page