EHDS-Article-cited anonymization toolkit for secondary-use health data (FHIR + tabular)
Project description
ehds-anon-kit
EHDS-Article-cited anonymization for secondary-use health data.
A Python CLI that de-identifies FHIR R4 bundles and tabular EHR data for Regulation (EU) 2025/327 (EHDS) Chapter IV secondary-use data permits — and emits a manifest that cites the exact Article and Recital mandating each transformation.
What and why
Regulation (EU) 2025/327, Chapter IV (Art. 64-72), OJ 2025-03-05, establishes the EHDS secondary-use framework. Commission implementing acts specifying technical anonymization standards for HealthData@EU are expected H1-H2 2026. Health data access bodies (HDABs) are already processing permit applications under the existing Article text.
Existing open-source tools (synthea, ARX, academic libraries) do not:
- Emit a per-transformation regulatory citation tied to EHDS Art. 64-72
- Implement the Art. 72 pseudonymisation key custody chain of evidence
- Target the HealthData@EU secondary-use submission workflow
ehds-anon-kit fills that gap. Every transformation is traceable to its legal basis.
Install
pip install ehds-anon-kit
With tabular (CSV) support:
pip install "ehds-anon-kit[tabular]"
Quickstart
ehds-anon \
--fhir-bundle data/bundle.json \
--profile ehds-secondary-default \
--key-custody key-custody.yaml \
--out output/
With tabular data:
ehds-anon \
--fhir-bundle data/bundle.json \
--tabular data/patients.csv \
--profile ehds-secondary-default \
--key-custody key-custody.yaml \
--out output/
key-custody.yaml (choose one key source):
# Option 1: environment variable (recommended)
env_var: EHDS_PSEUDO_KEY
# Option 2: HashiCorp Vault
# vault_path: vault://ehds-keys/patient-key
# Option 3: in-process (triggers Art. 72 warning — disclose to HDAB)
# inline_key: "your-secret-key"
Outputs
| File | Description |
|---|---|
bundle_anon.json |
Anonymized FHIR R4 bundle |
tabular_anon.csv |
k-anonymized EHR table (if --tabular given) |
ehds_evidence.json |
Machine-readable EHDS Art. 64-72 evidence manifest |
ehds_evidence.md |
Human-readable manifest for DPO / HDAB submission |
audit.sha256 |
Tamper-evident hash chain over all inputs + outputs |
Anonymization profiles
| Profile | k-anonymity | Date-shift | Postal code | Target use |
|---|---|---|---|---|
ehds-secondary-default |
k=5 | ±90 days | 3 chars (NUTS-3) | Most EHDS Chapter IV permits |
ehds-research-strict |
k=10 | ±180 days | 2 chars | High-sensitivity / HealthData@EU cross-border |
FHIR transformations (with citations)
| Resource | Field | Action | Citation |
|---|---|---|---|
| Patient | identifier |
Replace with pseudonym | Art. 72; Rec. 66 |
| Patient | name |
Remove | Art. 65; Rec. 65 |
| Patient | birthDate |
Truncate to year | Art. 65; Rec. 71 |
| Patient | address |
Generalise to 3-char postal | Art. 65 |
| Observation | effectiveDateTime |
Date-shift ±90d | Rec. 71 |
| Encounter | period |
Date-shift ±90d | Rec. 71 |
| Encounter | participant.individual |
Pseudonymise practitioner | Art. 65 |
See docs/ehds-citation-map.md for the full transformation-to-Article mapping.
Art. 72 key custody
| Key source | Art. 72 disclosure required |
|---|---|
hsm://... |
No — hardware isolation |
vault://... |
No — isolated vault |
env:VAR |
No — operator-managed |
| inline | YES — must disclose to HDAB |
The key source and custody chain are recorded in ehds_evidence.json.
Known gaps
These limitations are documented honestly. The tool is an MVP targeting the most common EHDS secondary-use use case.
- Parquet not implemented: tabular anonymization reads/writes CSV only. Parquet
support requires
pyarroworfastparquetand is planned for v0.2. - HSM/Vault stubs only:
hsm://andvault://key sources emit a warning and fall back to a placeholder key. Full PKCS#11 and Vault integration is planned for v0.2. - FHIR resource coverage: only Patient, Observation, and Encounter are de-identified. Other resource types (Condition, MedicationRequest, DiagnosticReport, etc.) are passed through unchanged.
- No differential privacy: the tool does not implement DP-style noise injection.
- No t-closeness: only k-anonymity and l-diversity are reported for tabular data.
- Commission implementing acts pending: the Art. 65-72 implementing acts specifying
exact technical standards are expected H2 2026. All citations in
data/ehds_text.yamlare markedexcerpt_type: paraphrase; the tool will be updated when implementing acts are published in the OJ. - Not a legal determination: this tool produces an engineering evidence artifact. It does not constitute a formal GDPR anonymization determination. Review by a DPO or legal counsel is required before HDAB submission.
Citations
Regulation (EU) 2025/327 of the European Parliament and of the Council of 12 February 2025 on the European Health Data Space. Official Journal of the European Union, L 2025/327, 5 March 2025. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32025R0327
License
MIT. See LICENSE.
Contributing
Issues and PRs welcome. Before contributing, please:
- Run
ruff check src/ tests/andmypy --strict src/ - Ensure
pytestpasses with no failures - Reference the relevant EHDS Article in any citation-related change
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ehds_anon_kit-0.1.0.tar.gz.
File metadata
- Download URL: ehds_anon_kit-0.1.0.tar.gz
- Upload date:
- Size: 38.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1885cfede5246b17ded04a61bb956a5c55e998da60fd64e8ce1fff3c74dabd6
|
|
| MD5 |
8f1e3f2ca82a5be2212d051e7de29361
|
|
| BLAKE2b-256 |
8ed5c7ed86d510339bb09291130594a4792cf10594685494ebfd7f8c6fb4d162
|
File details
Details for the file ehds_anon_kit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ehds_anon_kit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123d85491b4413680442e96be3df0f42742f789795ff74ef12a60bf582aa1bdd
|
|
| MD5 |
73ac7699ad998201b0a4275682d1031b
|
|
| BLAKE2b-256 |
afe6bf71614ccb000d3da18c648913a8069b5b5c7315d090a223361bef74d9a9
|