Skip to main content

Tools for encoding FHIR terminology concepts for machine learning

Project description

FHIR Terminology Encoder

This is a scikit-learn compatible encoder that uses a FHIR terminology server to encode ontological features.

It currently supports subsumption relationships and properties.

You supply a scope in the form of a FHIR ValueSet URI, and a FHIR terminology endpoint.

The result is a multi-hot encoded vector delivered as a sparse matrix, suitable for input into most models and estimators.

Installation

pip install fhir-tx-encoder

Usage

from fhir_tx import FhirTerminologyEncoder
import numpy as np

encoder = FhirTerminologyEncoder(
    # Ancestors of the SNOMED CT concept "Malignant neoplastic disease" (363346000)
    scope="http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)",
    # Include "Associated morphology" (116676008) as a property
    properties=["116676008"]
)

# Encode two SNOMED CT concepts:
# - "Neoplasm and/or hamartoma" (399981008)
# - "Malignant neoplastic disease" (363346000)
result = encoder.fit_transform(np.array([["399981008", "363346000"]]))

# Print out the result and its shape.
print(f"result.shape: {result.shape}")
print(f"result:\n{result.toarray()}")

# Print out the feature names.
print(f"encoder.feature_names_: {encoder.feature_names_}")

Which would output:

Expanding value set: http://snomed.info/sct?fhir_vs=ecl/(%3E%3E%20363346000)
Expanding (6 items, offset 0, total 6)
Expansion complete
Generating one-hot encoding... (6, 6)
Creating index... 6 items
Applying transitive closure...
Batch 1 of 1, 6 items... 15 pairs added
Subsumption encoding complete: (6, 6)
Encoding properties... (6, 9)
result.shape: (2, 9)
result:
[[1. 1. 0. 1. 0. 1. 0. 0. 1.]
 [1. 1. 1. 1. 1. 1. 0. 1. 0.]]
encoder.feature_names_: ['404684003', '64572001', '363346000', '399981008', '55342001', '138875005', '609096000.116676008=108369006', '609096000.116676008=1240414004', '609096000.116676008=400177003']

Important note

This software is currently in alpha. It is not yet ready for production use.

Copyright © 2023, Commonwealth Scientific and Industrial Research Organisation (CSIRO) ABN 41 687 119 230. Licensed under the Apache License, version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fhir_tx_encoder-1.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

fhir_tx_encoder-1.1.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file fhir_tx_encoder-1.1.0.tar.gz.

File metadata

  • Download URL: fhir_tx_encoder-1.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for fhir_tx_encoder-1.1.0.tar.gz
Algorithm Hash digest
SHA256 623fcad188a489d200635650d6e3a264a7d363fac5f47c36686d40e0cf7f2a85
MD5 0e6d7a14412b602ed38fbbe1daa651b4
BLAKE2b-256 08adcd90bc48085d090390d3e291918ab403a28818f5a89762d10183fe63cd90

See more details on using hashes here.

File details

Details for the file fhir_tx_encoder-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fhir_tx_encoder-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29516429b036336074b99e632203214ad44940c223b85d9d9c38df0f884f4fbb
MD5 9899edee3b4ee1b3b8149e679eb2f5ef
BLAKE2b-256 33a00eb55a7e173a002b812d29c03387c7f14ed1e2e38dbeaeb84c4fa9bfab04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page