Skip to main content

A hiding in plain sight module for Dutch medical text.

Project description

🇳🇱 dutch-med-hips

dutch-med-hips is a Python package for anonymizing Dutch medical reports using the Hide-In-Plain-Sight (HIPS) methodology. It replaces sensitive personal data with realistic surrogates while preserving the readability and overall structure of the text.


🚀 Features

  • Replace personally identifiable information (PII) with synthetic, context-aware surrogates.
  • Supports replacement for names, dates, locations, hospitals, study names, phone numbers, IDs, and more.
  • Uses real-world statistical distributions (e.g. for age, character frequency) to generate natural-looking output.
  • Adds a disclaimer to the final anonymized report.
  • Configurable behavior via JSON weight configuration files.

📦 Installation

Create a fresh conda environment:

conda create -n dutch-med-hips python=3.11
conda activate dutch-med-hips

Install the package via pip:

pip install dutch-med-hips

Or from source:

git clone https://github.com/DIAGNijmegen/dutch-med-hips.git
cd dutch-med-hips
pip install .

🛠️ Usage

CLI

After installation, use the CLI to anonymize a report:

hips \
  --input_file path/to/input_report.txt \
  --output_file path/to/output_report.txt \
  --seed 42

Arguments:

  • --input_file: Path to the file containing the original report.
  • --output_file: Path to write the anonymized report.
  • --seed: (Optional) Seed for reproducibility. Default is 42.
  • --ner_labels: (Optional) List of NER labels for offset adjustment, currently disabled via CLI.

Python API

from dutch_med_hips.hips_functions import HideInPlainSight

report = "<PERSOON> had a consultation on <DATUM> at <TIJD>."
hips = HideInPlainSight()
anonymized_report = hips.apply_hips(report)
print(anonymized_report)

📄 Supported Tags

The following tags in the report will be replaced:

Tag Replacement
<PERSOON> Realistic person names
<DATUM> Randomized date
<TIJD> Randomized time
<TELEFOONNUMMER> Synthetic phone number
<PATIENTNUMMER> Synthetic patient ID
<ZNUMMER> Synthetic Z-number
<PLAATS> Dutch city
<RAPPORT-ID.*> Custom report ID
<PHINUMMER> Synthetic PHI number
<LEEFTIJD> Realistic age (GMM-based)
<PERSOONAFKORTING> Name abbreviation
<ZIEKENHUIS> Dutch hospital
<ACCREDATIE_NUMMER> Accreditation number
<STUDIE-NAAM> Study name (optionally with UZR code)

⚙️ Configuration

The package loads a configuration files and multiple lookup lists from its config/ directory. Modify these files to adjust behavior without changing the code. In particular, config/config.json contains weights for various replacement strategies. Adjust these to fit your dataset needs.


🤝 Contributing

Want to help improve Dutch Med HIPS?

  1. Fork the repository.
  2. Create your feature branch.
  3. Submit a pull request with tests if applicable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dutch_med_hips-0.1.1.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dutch_med_hips-0.1.1-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file dutch_med_hips-0.1.1.tar.gz.

File metadata

  • Download URL: dutch_med_hips-0.1.1.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dutch_med_hips-0.1.1.tar.gz
Algorithm Hash digest
SHA256 414073a43c1e6c8c4865db54a96b8c682a0fd5423b89c1579abce1bb265d03b1
MD5 f6a0646dd9ebb7bc9d67b1268f632557
BLAKE2b-256 fb6179ddc9ccfeadb51f810eadc9af7e7be50de4debf96f7b8070a5ca2a7c498

See more details on using hashes here.

File details

Details for the file dutch_med_hips-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dutch_med_hips-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dutch_med_hips-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf405cb5db2aa5fa61e56cbbeb9751e93e0c08df69c2ad6b0dd0f96c26c174a7
MD5 f567182658fd407d8d08bc9668890051
BLAKE2b-256 45791fbc505aa748226059d817abc0c2532786f72f2f1f7de0229cc52ab9005d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page