Skip to main content

A hiding in plain sight module for Dutch medical text.

Project description

🇳🇱 dutch-med-hips

dutch-med-hips is a Python package for anonymizing Dutch medical reports using the Hide-In-Plain-Sight (HIPS) methodology. It replaces sensitive personal data with realistic surrogates while preserving the readability and overall structure of the text.


🚀 Features

  • Replace personally identifiable information (PII) with synthetic, context-aware surrogates.
  • Supports replacement for names, dates, locations, hospitals, study names, phone numbers, IDs, and more.
  • Uses real-world statistical distributions (e.g. for age, character frequency) to generate natural-looking output.
  • Adds a disclaimer to the final anonymized report.
  • Configurable behavior via JSON weight configuration files.

📦 Installation

Create a fresh conda environment:

conda create -n dutch-med-hips python=3.11
conda activate dutch-med-hips

Install the package via pip:

pip install dutch-med-hips

Or from source:

git clone https://github.com/DIAGNijmegen/dutch-med-hips.git
cd dutch-med-hips
pip install .

🛠️ Usage

CLI

After installation, use the CLI to anonymize a report:

hips \
  --input_file path/to/input_report.txt \
  --output_file path/to/output_report.txt \
  --seed 42

Arguments:

  • --input_file: Path to the file containing the original report.
  • --output_file: Path to write the anonymized report.
  • --seed: (Optional) Seed for reproducibility. Default is 42.
  • --ner_labels: (Optional) List of NER labels for offset adjustment, currently disabled via CLI.

Python API

from dutch_med_hips.hips_functions import HideInPlainSight

report = "<PERSOON> had a consultation on <DATUM> at <TIJD>."
hips = HideInPlainSight()
anonymized_report = hips.apply_hips(report)
print(anonymized_report)

📄 Supported Tags

The following tags in the report will be replaced:

Tag Replacement
<PERSOON> Realistic person names
<DATUM> Randomized date
<TIJD> Randomized time
<TELEFOONNUMMER> Synthetic phone number
<PATIENTNUMMER> Synthetic patient ID
<ZNUMMER> Synthetic Z-number
<PLAATS> Dutch city
<RAPPORT-ID.*> Custom report ID
<PHINUMMER> Synthetic PHI number
<LEEFTIJD> Realistic age (GMM-based)
<PERSOONAFKORTING> Name abbreviation
<ZIEKENHUIS> Dutch hospital
<ACCREDATIE_NUMMER> Accreditation number
<STUDIE-NAAM> Study name (optionally with UZR code)

⚙️ Configuration

The package loads a configuration files and multiple lookup lists from its config/ directory. Modify these files to adjust behavior without changing the code. In particular, config/config.json contains weights for various replacement strategies. Adjust these to fit your dataset needs.


🤝 Contributing

Want to help improve Dutch Med HIPS?

  1. Fork the repository.
  2. Create your feature branch.
  3. Submit a pull request with tests if applicable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dutch_med_hips-0.1.0.tar.gz (50.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dutch_med_hips-0.1.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file dutch_med_hips-0.1.0.tar.gz.

File metadata

  • Download URL: dutch_med_hips-0.1.0.tar.gz
  • Upload date:
  • Size: 50.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dutch_med_hips-0.1.0.tar.gz
Algorithm Hash digest
SHA256 77f18893baa2f180db926091acdacdf7e8ccb5d849c2c85caa853be723237ab5
MD5 95670f441d12cdc4e2bde5a88f59274a
BLAKE2b-256 4af7f2c16f0dd6415c0f395676f10ff41a074cddee714aa62f25b08b00d1fcc9

See more details on using hashes here.

File details

Details for the file dutch_med_hips-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dutch_med_hips-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dutch_med_hips-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 372311f79aebbb4714f3923ce77de7d6591b97cad92adc3c27a0cc177eebcc84
MD5 104bb1a3131c50882c4cf8703145fdc5
BLAKE2b-256 f20cfda98510c58074b6bdc9be023d6d41a220faeb26994113fba0b4ea2991c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page