FHIR to pandas.dataframe for AI and ML
Project description
:fire: fhiry - FHIR to pandas dataframe for data analytics, AI and ML
Virtual flattened view of FHIR Bundle / ndjson / FHIR server / BigQuery!
:fire: FHIRy is a python package to facilitate health data analytics and machine learning by converting a folder of FHIR bundles/ndjson from bulk data export into a pandas data frame for analysis. You can import the dataframe into ML packages such as Tensorflow and PyTorch. FHIRy also supports FHIR server search and FHIR tables on BigQuery.
UPDATE
Recently added support for LLM based natural language queries of FHIR bundles/ndjson using llama-index. Please install the llm extras as follows. Please be cognizant of the privacy issues with publically hosted LLMs. Any feedback will be highly appreciated. See usage!
pip install fhiry[llm]
Test this with the synthea sample or the downloaded ndjson from the SMART Bulk data server. Use the 'Discussions' tab above for feature requests.
:sparkles: Checkout this template for Multimodal machine learning in healthcare!
Installation
Stable
pip install fhiry
Latest dev version
pip install git+https://github.com/dermatologist/fhiry.git
Usage
1. Import FHIR bundles (JSON) from folder to pandas dataframe
import fhiry.parallel as fp
df = fp.process('/path/to/fhir/resources')
print(df.info())
Example source data set: Synthea
Jupyter notebook example: notebooks/synthea.ipynb
2. Import NDJSON from folder to pandas dataframe
import fhiry.parallel as fp
df = fp.ndjson('/path/to/fhir/ndjson/files')
print(df.info())
Example source data set: SMART Bulk Data Server Export
Jupyter notebook example: notebooks/ndjson.ipynb
3. Import FHIR Search results to pandas dataframe
Fetch and import resources from FHIR Search API results to pandas dataframe.
Documentation: fhir-search.md
Example: Import all conditions with a certain code from FHIR Server
Fetch and import all condition resources with Snomed (Codesystem http://snomed.info/sct
) Code 39065001
in the FHIR element Condition.code
(resource type specific FHIR search parameter code
) to a pandas dataframe:
from fhiry.fhirsearch import Fhirsearch
fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
my_fhir_search_parameters = {
"code": "http://snomed.info/sct|39065001",
}
df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)
print(df.info())
4. Import Google BigQuery FHIR dataset
from fhiry.bqsearch import BQsearch
bqs = BQsearch()
df = bqs.search("SELECT * FROM `bigquery-public-data.fhir_synthea.patient` LIMIT 20") # can be a path to .sql file
Filters
Pass a config json to any of the constructors:
- config_json can be a path to a json file.
df = fp.process('/path/to/fhir/resources', config_json='{ "REMOVE": ["resource.text.div"], "RENAME": { "resource.id": "id" } }')
fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir", config_json = '{ "REMOVE": ["resource.text.div"], "RENAME": { "resource.id": "id" } }')
bqs = BQsearch('{ "REMOVE": ["resource.text.div"], "RENAME": { "resource.id": "id" } }')
Columns
- see df.columns
patientId
fullUrl
resource.resourceType
resource.id
resource.name
resource.telecom
resource.gender
...
...
...
Documentation
Give us a star ⭐️
If you find this project useful, give us a star. It helps others discover the project.
Contributors
- Bell Eapen |
- Markus Mandalka
- PR welcome, please see CONTRIBUTING.md
- using CC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file fhiry-4.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: fhiry-4.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eba1b09aa329c509914055899d256100af91c4e37da49ccecd744faad1e35ba4 |
|
MD5 | 2e295aad36e5b9dc26546ba8a786a3a3 |
|
BLAKE2b-256 | cdccecaadc9f9c6c09b9e637b626b3808a18c48a3e21acabacb90d796b5b8431 |