Skip to main content

EHDS-Article-cited anonymization toolkit for secondary-use health data (FHIR + tabular)

Project description

ehds-anon-kit

EHDS-Article-cited anonymization for secondary-use health data.

A Python CLI that de-identifies FHIR R4 bundles and tabular EHR data for Regulation (EU) 2025/327 (EHDS) Chapter IV secondary-use data permits — and emits a manifest that cites the exact Article and Recital mandating each transformation.

CI PyPI License: MIT


What and why

Regulation (EU) 2025/327, Chapter IV (Art. 64-72), OJ 2025-03-05, establishes the EHDS secondary-use framework. Commission implementing acts specifying technical anonymization standards for HealthData@EU are expected H1-H2 2026. Health data access bodies (HDABs) are already processing permit applications under the existing Article text.

Existing open-source tools (synthea, ARX, academic libraries) do not:

  • Emit a per-transformation regulatory citation tied to EHDS Art. 64-72
  • Implement the Art. 72 pseudonymisation key custody chain of evidence
  • Target the HealthData@EU secondary-use submission workflow

ehds-anon-kit fills that gap. Every transformation is traceable to its legal basis.


Install

pip install ehds-anon-kit

With tabular (CSV) support:

pip install "ehds-anon-kit[tabular]"

Quickstart

ehds-anon \
  --fhir-bundle data/bundle.json \
  --profile ehds-secondary-default \
  --key-custody key-custody.yaml \
  --out output/

With tabular data:

ehds-anon \
  --fhir-bundle data/bundle.json \
  --tabular data/patients.csv \
  --profile ehds-secondary-default \
  --key-custody key-custody.yaml \
  --out output/

key-custody.yaml (choose one key source):

# Option 1: environment variable (recommended)
env_var: EHDS_PSEUDO_KEY

# Option 2: HashiCorp Vault
# vault_path: vault://ehds-keys/patient-key

# Option 3: in-process (triggers Art. 72 warning — disclose to HDAB)
# inline_key: "your-secret-key"

Outputs

File Description
bundle_anon.json Anonymized FHIR R4 bundle
tabular_anon.csv k-anonymized EHR table (if --tabular given)
ehds_evidence.json Machine-readable EHDS Art. 64-72 evidence manifest
ehds_evidence.md Human-readable manifest for DPO / HDAB submission
audit.sha256 Tamper-evident hash chain over all inputs + outputs

Anonymization profiles

Profile k-anonymity Date-shift Postal code Target use
ehds-secondary-default k=5 ±90 days 3 chars (NUTS-3) Most EHDS Chapter IV permits
ehds-research-strict k=10 ±180 days 2 chars High-sensitivity / HealthData@EU cross-border

FHIR transformations (with citations)

Resource Field Action Citation
Patient identifier Replace with pseudonym Art. 72; Rec. 66
Patient name Remove Art. 65; Rec. 65
Patient birthDate Truncate to year Art. 65; Rec. 71
Patient address Generalise to 3-char postal Art. 65
Observation effectiveDateTime Date-shift ±90d Rec. 71
Encounter period Date-shift ±90d Rec. 71
Encounter participant.individual Pseudonymise practitioner Art. 65

See docs/ehds-citation-map.md for the full transformation-to-Article mapping.


Art. 72 key custody

Key source Art. 72 disclosure required
hsm://... No — hardware isolation
vault://... No — isolated vault
env:VAR No — operator-managed
inline YES — must disclose to HDAB

The key source and custody chain are recorded in ehds_evidence.json.


Known gaps

These limitations are documented honestly. The tool is an MVP targeting the most common EHDS secondary-use use case.

  1. Parquet not implemented: tabular anonymization reads/writes CSV only. Parquet support requires pyarrow or fastparquet and is planned for v0.2.
  2. HSM/Vault stubs only: hsm:// and vault:// key sources emit a warning and fall back to a placeholder key. Full PKCS#11 and Vault integration is planned for v0.2.
  3. FHIR resource coverage: only Patient, Observation, and Encounter are de-identified. Other resource types (Condition, MedicationRequest, DiagnosticReport, etc.) are passed through unchanged.
  4. No differential privacy: the tool does not implement DP-style noise injection.
  5. No t-closeness: only k-anonymity and l-diversity are reported for tabular data.
  6. Commission implementing acts pending: the Art. 65-72 implementing acts specifying exact technical standards are expected H2 2026. All citations in data/ehds_text.yaml are marked excerpt_type: paraphrase; the tool will be updated when implementing acts are published in the OJ.
  7. Not a legal determination: this tool produces an engineering evidence artifact. It does not constitute a formal GDPR anonymization determination. Review by a DPO or legal counsel is required before HDAB submission.

Citations

Regulation (EU) 2025/327 of the European Parliament and of the Council of 12 February 2025 on the European Health Data Space. Official Journal of the European Union, L 2025/327, 5 March 2025. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32025R0327


License

MIT. See LICENSE.


Contributing

Issues and PRs welcome. Before contributing, please:

  1. Run ruff check src/ tests/ and mypy --strict src/
  2. Ensure pytest passes with no failures
  3. Reference the relevant EHDS Article in any citation-related change

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ehds_anon_kit-0.1.0.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ehds_anon_kit-0.1.0-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file ehds_anon_kit-0.1.0.tar.gz.

File metadata

  • Download URL: ehds_anon_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ehds_anon_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d1885cfede5246b17ded04a61bb956a5c55e998da60fd64e8ce1fff3c74dabd6
MD5 8f1e3f2ca82a5be2212d051e7de29361
BLAKE2b-256 8ed5c7ed86d510339bb09291130594a4792cf10594685494ebfd7f8c6fb4d162

See more details on using hashes here.

File details

Details for the file ehds_anon_kit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ehds_anon_kit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ehds_anon_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 123d85491b4413680442e96be3df0f42742f789795ff74ef12a60bf582aa1bdd
MD5 73ac7699ad998201b0a4275682d1031b
BLAKE2b-256 afe6bf71614ccb000d3da18c648913a8069b5b5c7315d090a223361bef74d9a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page