Tools for encoding FHIR terminology concepts for machine learning
Project description
FHIR Terminology Encoder
This is a scikit-learn compatible encoder that uses a FHIR terminology server to encode ontological features.
It currently supports subsumption relationships and properties.
You supply a scope in the form of a FHIR ValueSet URI, and a FHIR terminology endpoint.
The result is a multi-hot encoded vector delivered as a sparse matrix, suitable for input into most models and estimators.
Installation
pip install fhir-tx-encoder
Usage
from fhir_tx import FhirTerminologyEncoder
import numpy as np
encoder = FhirTerminologyEncoder(
# Ancestors of the SNOMED CT concept "Malignant neoplastic disease" (363346000)
scope="http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)",
# Include "Associated morphology" (116676008) as a property
properties=["116676008"]
)
# Encode two SNOMED CT concepts:
# - "Neoplasm and/or hamartoma" (399981008)
# - "Malignant neoplastic disease" (363346000)
result = encoder.fit_transform(np.array([["399981008", "363346000"]]))
# Print out the result and its shape.
print(f"result.shape: {result.shape}")
print(f"result:\n{result.toarray()}")
# Print out the feature names.
print(f"encoder.feature_names_: {encoder.feature_names_}")
Which would output:
Expanding value set: http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)
Expanding (6 items, offset 0, total 6)
Expansion complete
Generating one-hot encoding... (6, 6)
Creating index... 6 items
Applying transitive closure...
Batch 1 of 1, 6 items... 15 pairs added
Subsumption encoding complete: (6, 6)
Encoding properties... (6, 9)
result.shape: (2, 9)
result:
[[1. 1. 0. 1. 0. 1. 0. 0. 1.]
[1. 1. 1. 1. 1. 1. 0. 1. 0.]]
encoder.feature_names_: ['404684003', '64572001', '363346000', '399981008', '55342001', '138875005', '609096000.116676008=108369006', '609096000.116676008=1240414004', '609096000.116676008=400177003']
Important note
This software is currently in alpha. It is not yet ready for production use.
Copyright © 2023, Commonwealth Scientific and Industrial Research Organisation (CSIRO) ABN 41 687 119 230. Licensed under the Apache License, version 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fhir_tx_encoder-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac27aca40dee9546269a0bd3d91c23f14c45fd0bcce750bb811cfc9f5c94972d |
|
MD5 | 110725d028f943139f90941ae5217920 |
|
BLAKE2b-256 | b905972a101dcc8c5dd97ad99808d652d591e773f06acc3d788235ad19b31a44 |