Tools for encoding FHIR terminology concepts for machine learning
Project description
FHIR Terminology Encoder
This is a scikit-learn compatible encoder that uses a FHIR terminology server to encode ontological features.
It currently supports subsumption relationships and properties.
You supply a scope in the form of a FHIR ValueSet URI, and a FHIR terminology endpoint.
The result is a multi-hot encoded vector delivered as a sparse matrix, suitable for input into most models and estimators.
Installation
pip install fhir-tx-encoder
Usage
from fhir_tx import FhirTerminologyEncoder
import numpy as np
encoder = FhirTerminologyEncoder(
# Ancestors of the SNOMED CT concept "Malignant neoplastic disease" (363346000)
scope="http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)",
# Include "Associated morphology" (116676008) as a property
properties=["116676008"]
)
# Encode two SNOMED CT concepts:
# - "Neoplasm and/or hamartoma" (399981008)
# - "Malignant neoplastic disease" (363346000)
result = encoder.fit_transform(np.array([["399981008", "363346000"]]))
# Print out the result and its shape.
print(f"result.shape: {result.shape}")
print(f"result:\n{result.toarray()}")
# Print out the feature names.
print(f"encoder.feature_names_: {encoder.feature_names_}")
Which would output:
Expanding value set: http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)
Expanding (6 items, offset 0, total 6)
Expansion complete
Generating one-hot encoding... (6, 6)
Creating index... 6 items
Applying transitive closure...
Batch 1 of 1, 6 items... 15 pairs added
Subsumption encoding complete: (6, 6)
Encoding properties... (6, 9)
result.shape: (2, 9)
result:
[[1. 1. 0. 1. 0. 1. 0. 0. 1.]
[1. 1. 1. 1. 1. 1. 0. 1. 0.]]
encoder.feature_names_: ['404684003', '64572001', '363346000', '399981008', '55342001', '138875005', '609096000.116676008=108369006', '609096000.116676008=1240414004', '609096000.116676008=400177003']
Important note
This software is currently in alpha. It is not yet ready for production use.
Copyright © 2023, Commonwealth Scientific and Industrial Research Organisation (CSIRO) ABN 41 687 119 230. Licensed under the Apache License, version 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fhir_tx_encoder-1.1.0.tar.gz.
File metadata
- Download URL: fhir_tx_encoder-1.1.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
623fcad188a489d200635650d6e3a264a7d363fac5f47c36686d40e0cf7f2a85
|
|
| MD5 |
0e6d7a14412b602ed38fbbe1daa651b4
|
|
| BLAKE2b-256 |
08adcd90bc48085d090390d3e291918ab403a28818f5a89762d10183fe63cd90
|
File details
Details for the file fhir_tx_encoder-1.1.0-py3-none-any.whl.
File metadata
- Download URL: fhir_tx_encoder-1.1.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29516429b036336074b99e632203214ad44940c223b85d9d9c38df0f884f4fbb
|
|
| MD5 |
9899edee3b4ee1b3b8149e679eb2f5ef
|
|
| BLAKE2b-256 |
33a00eb55a7e173a002b812d29c03387c7f14ed1e2e38dbeaeb84c4fa9bfab04
|