Skip to main content

Rule-based de-identification for Belgian clinical text

Project description

black

Belgian Deduce

belgian_deduce is a rule-based de-identification package for Belgian clinical text. It ships as a standalone Python package, uses Belgian lookup data and defaults, and is built on top of docdeid.

  • Remove names, places, institutions, dates, ages, identifiers, phone numbers, e-mail addresses, and URLs from Belgian medical text
  • Tune behavior through config, lookup structures, and custom processors
  • Use Belgian defaults for postal codes, phone numbers, and national register numbers

belgian_deduce started from the original deduce project. This repository now maintains its own package identity, configuration, documentation, and Belgian-specific defaults.

De-identification is never perfect. Validate and adapt the package on your own data before using it in a critical environment.

Citing

If you use belgian_deduce, cite the original DEDUCE paper for the underlying method and reference this repository and version in your implementation notes:

Menger, V.J., Scheepers, F., van Wijk, L.M., Spruit, M. (2017). DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text, Telematics and Informatics, 2017, ISSN 0736-5853

Installation

Install the latest release from PyPI:

pip install belgian-deduce

Getting Started

from belgian_deduce import Deduce

model = Deduce()

text = (
    "betreft: Jan Janssens, rijksregisternummer 85.07.30-033.28, patnr 000334433. "
    "De patient J. Janssens is 64 jaar oud en woont in Leuven. Hij werd op "
    "10 oktober 2018 door arts Peter de Smet ontslagen uit UZ Leuven. "
    "Voor nazorg kan hij worden bereikt via j.janssens.123@gmail.com of "
    "0470 12 34 56."
)

doc = model.deidentify(text)
print(doc.deidentified_text)
betreft: [PERSON-1], rijksregisternummer [NATIONAL_REGISTER_NUMBER-1], patnr [ID-1]. De patient [PERSON-1] is [AGE-1] jaar oud en woont in [LOCATION-1]. Hij werd op [DATE-1] door arts [PERSON-2] ontslagen uit [HOSPITAL-1]. Voor nazorg kan hij worden bereikt via [EMAIL-1] of [PHONE_NUMBER-1].

If patient metadata is known, pass it explicitly:

from belgian_deduce import Deduce, Person

model = Deduce()
patient = Person(first_names=["Jan"], initials="JJ", surname="Janssens")
doc = model.deidentify(text, metadata={"patient": patient})

Metadata can also be used for more than the primary patient. The pipeline supports:

  • persons: one or more additional Person objects treated as regular people
  • addresses: one or more Address objects that should be tagged as location
  • entities: arbitrary exact metadata matches via MetadataEntity
  • Person.birth_date, Person.aliases, and Person.addresses
from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

metadata = {
    "patient": Person(
        first_names=["Jan"],
        surname="Jansen",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Kerkstraat",
                house_number="12A",
                postal_code="9000",
                city="Gent",
            )
        ],
    ),
    "persons": [
        Person(
            first_names=["Peter"],
            surname="de Visser",
            aliases=["Dr. Peter de Visser"],
        )
    ],
    "entities": [
        MetadataEntity(text="UZ Gent", tag="hospital"),
        MetadataEntity(text="ABC-12345", tag="id"),
    ],
}

doc = deduce.deidentify(text, metadata=metadata)

French-speaking notes can be handled through the same API. A practical path is to provide metadata for names, birth dates, addresses, and institutions:

from datetime import date

from belgian_deduce import Address, Deduce, MetadataEntity, Person

deduce = Deduce()

text = (
    "Patient Jean Dupont, né le 12 mars 1980, habite Rue de la Loi 12, "
    "1000 Bruxelles. Sophie Martin consulte à Hôpital Erasme."
)

metadata = {
    "patient": Person(
        first_names=["Jean"],
        surname="Dupont",
        birth_date=date(1980, 3, 12),
        addresses=[
            Address(
                street="Rue de la Loi",
                house_number="12",
                postal_code="1000",
                city="Bruxelles",
            )
        ],
    ),
    "persons": [Person(first_names=["Sophie"], surname="Martin")],
    "entities": [MetadataEntity(text="Hôpital Erasme", tag="hospital")],
}

doc = deduce.deidentify(text, metadata=metadata)
print(doc.deidentified_text)
Patient [PATIENT], né le [DATE-1], habite [LOCATION-1]. [PERSON-1] consulte à [HOSPITAL-1].

Documentation

The project documentation lives in docs/source/tutorial.md and docs/source/migrating.md.

Contributing

Contribution guidance is available in CONTRIBUTING.md.

Versions

  • 4.0.1 - Polished the published package page and stabilized the tag-driven release workflow
  • 4.0.0 - First standalone belgian_deduce release with Belgian defaults and independent docs/tooling
  • Earlier entries in CHANGELOG.md predate the standalone release and are preserved for provenance

Authors

  • Vincent Menger - original DEDUCE implementation
  • Stig Hellemans - Belgian standalone package and maintenance

License

This project remains licensed under LGPL-3.0-or-later. See LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

belgian_deduce-4.0.1.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

belgian_deduce-4.0.1-py3-none-any.whl (8.5 MB view details)

Uploaded Python 3

File details

Details for the file belgian_deduce-4.0.1.tar.gz.

File metadata

  • Download URL: belgian_deduce-4.0.1.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.0.1.tar.gz
Algorithm Hash digest
SHA256 05599184edff3cd07b9f06d18223f86a8d39e71208112375d607f73e04b42f0e
MD5 4be8fd237156fda6e9645261516a870e
BLAKE2b-256 81cfe8553b5d06f6c9c537240c228a050d3190ae408858f9f8255f26cf53c563

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.0.1.tar.gz:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file belgian_deduce-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: belgian_deduce-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for belgian_deduce-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f4bcc5e671e50482cbbf560ebcd786ee655ea8fbbe8ed8c1c612eee29016f02d
MD5 5c4fbe7e6dbcc5ca95991497feb0748f
BLAKE2b-256 c17a679fdfa549a8ebfbec52dc0290cfaa130947f265f1a3747b94ee49415aa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for belgian_deduce-4.0.1-py3-none-any.whl:

Publisher: build.yml on stighellemans/belgian-deduce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page