A hiding in plain sight module for Dutch medical text.
Project description
🇳🇱 dutch-med-hips
dutch-med-hips is a Python package for anonymizing Dutch medical reports using the Hide-In-Plain-Sight (HIPS) methodology. It replaces sensitive personal data with realistic surrogates while preserving the readability and overall structure of the text.
🚀 Features
- Replace personally identifiable information (PII) with synthetic, context-aware surrogates.
- Supports replacement for names, dates, locations, hospitals, study names, phone numbers, IDs, and more.
- Uses real-world statistical distributions (e.g. for age, character frequency) to generate natural-looking output.
- Adds a disclaimer to the final anonymized report.
- Configurable behavior via JSON weight configuration files.
📦 Installation
Create a fresh conda environment:
conda create -n dutch-med-hips python=3.11
conda activate dutch-med-hips
Install the package via pip:
pip install dutch-med-hips
Or from source:
git clone https://github.com/DIAGNijmegen/dutch-med-hips.git
cd dutch-med-hips
pip install .
🛠️ Usage
CLI
After installation, use the CLI to anonymize a report:
hips \
--input_file path/to/input_report.txt \
--output_file path/to/output_report.txt \
--seed 42
Arguments:
--input_file: Path to the file containing the original report.--output_file: Path to write the anonymized report.--seed: (Optional) Seed for reproducibility. Default is42.--ner_labels: (Optional) List of NER labels for offset adjustment, currently disabled via CLI.
Python API
from dutch_med_hips.hips_functions import HideInPlainSight
report = "<PERSOON> had a consultation on <DATUM> at <TIJD>."
hips = HideInPlainSight()
anonymized_report = hips.apply_hips(report)
print(anonymized_report)
📄 Supported Tags
The following tags in the report will be replaced:
| Tag | Replacement |
|---|---|
<PERSOON> |
Realistic person names |
<DATUM> |
Randomized date |
<TIJD> |
Randomized time |
<TELEFOONNUMMER> |
Synthetic phone number |
<PATIENTNUMMER> |
Synthetic patient ID |
<ZNUMMER> |
Synthetic Z-number |
<PLAATS> |
Dutch city |
<RAPPORT-ID.*> |
Custom report ID |
<PHINUMMER> |
Synthetic PHI number |
<LEEFTIJD> |
Realistic age (GMM-based) |
<PERSOONAFKORTING> |
Name abbreviation |
<ZIEKENHUIS> |
Dutch hospital |
<ACCREDATIE_NUMMER> |
Accreditation number |
<STUDIE-NAAM> |
Study name (optionally with UZR code) |
⚙️ Configuration
The package loads a configuration files and multiple lookup lists from its config/ directory.
Modify these files to adjust behavior without changing the code.
In particular, config/config.json contains weights for various replacement strategies. Adjust these to fit your dataset needs.
🤝 Contributing
Want to help improve Dutch Med HIPS?
- Fork the repository.
- Create your feature branch.
- Submit a pull request with tests if applicable.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dutch_med_hips-0.1.0.tar.gz.
File metadata
- Download URL: dutch_med_hips-0.1.0.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77f18893baa2f180db926091acdacdf7e8ccb5d849c2c85caa853be723237ab5
|
|
| MD5 |
95670f441d12cdc4e2bde5a88f59274a
|
|
| BLAKE2b-256 |
4af7f2c16f0dd6415c0f395676f10ff41a074cddee714aa62f25b08b00d1fcc9
|
File details
Details for the file dutch_med_hips-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dutch_med_hips-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
372311f79aebbb4714f3923ce77de7d6591b97cad92adc3c27a0cc177eebcc84
|
|
| MD5 |
104bb1a3131c50882c4cf8703145fdc5
|
|
| BLAKE2b-256 |
f20cfda98510c58074b6bdc9be023d6d41a220faeb26994113fba0b4ea2991c4
|