Mapping GDC's and Cellosaurus schema to FHIR schema.
Project description
fhirizer
Project overview:
Transforms and harmonizes data from Genomic Data Commons (GDC), Cellosaurus cell-lines, and International Cancer Genome Consortium (ICGC) repositories into 🔥 FHIR (Fast Healthcare Interoperability Resources) format.
-
GDC study simplified FHIR graph
Usage
Installation
- from source
git clone repo
cd fhirizer
# create virtual env ex.
# NOTE: package_data folders must be in python path in virtual envs
python -m venv venv-fhirizer
source venv-fhirizer/bin/activate
pip install .
- Dockerfile
(sudo) docker build -t <tag-name>:latest .
(sudo) docker run -it --mount type=bind,source=<path-to-input-ndjson>,target=/opt/data --rm <tag-name>:latest
- Singularity
singularity build fhirizer.sif docker://quay.io/ohsu-comp-bio/fhirizer
singularity shell fhirizer.sif
Convert and Generate
Detailed step-by-step guide on FHIRizing data for a project's study can be found in the project's directory overview.
-
GDC
-
convert GDC schema keys to fhir mapping
-
generate fhir object models ndjson files in directory
Example run for patient - replace path's to ndjson files or directories.
fhirizer convert --name case --in_path ./projects/<my-project>/cases.ndjson --out_path ./projects/<my-project>/cases_key.ndjson --verbose True fhirizer generate --name case --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/cases_key.ndjson
- to generate document reference for the patients
fhirizer convert --name file --in_path ./projects/<my-project>/files.ndjson --out_path ./projects/<my-project>/files_key.ndjson --verbose True fhirizer generate --name file --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/files_key.ndjson
-
-
Cellosaurus
- Cellosaurus ndjson follows Cellosaurus GET API json format
fhirizer generate --name cellosaurus --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/<cellosaurus-celllines-ndjson>
-
ICGC
fhirizer generate --name icgc --icgc <ICGC_project_name> --has_files
Constructing GDC maps cli cmds
initialize initial structure of project, case, or file to add Maps
fhirizer project_init
# to update Mappings run associated labels script ex ./labels/project.py
fhirizer case_init
fhirizer file_init
Testing
pytest -cov
fhirizer structure:
Data directories included in package data:
- resources: data resources generated or used in mappings
- mapping: json data maps produced by fhirizer pydantic schema maps
fhirizer/
|-- fhirizer/
| |-- __init__.py
| |-- labels/
| | |-- __init__.py
| | |-- files.py
| | |-- case.py
| | └── project.py
| |
| |-- schema.py
| |-- entity2fhir.py
| |-- mapping.py
| |-- utils.py
| └── cli.py
|
|-- mapping/
| |-- project.json
| |-- case.json
| └── file.json
|
|-- resources/
| |-- gdc_resources/
| | |-- content_annotations/
| | |-- data_dictionary/
| | └── fields/
| └── fhir_resources/
|
|-- tests/
| |-- __init__.py
| |-- unit/
| | |-- __init__.py
| | └── test_mapping.py
| |-- integration/
| | |-- __init__.py
| | |-- test_generate.py
| | └── test_convert.py
| └── fixtures/
|
|-- projects/
| └── GDC/
| | └── TCGA-STUDY/
| | |-- cases.ndjson
| | |-- filess.ndjson
| | └── META/
| └── ICGC/
| └── ICGC-STUDY/
| |-- data/
| └── META/
|--README.md
└── setup.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file fhirizer-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: fhirizer-2.0.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c422c93f6de5071348186c54fa34fea7d7ff77237f38cbc002368c27946ffec |
|
MD5 | 3bc7f4aa3cc96892db048f80a59dec0c |
|
BLAKE2b-256 | 9f7fbd94213829520e8f0809d995b36ad41fdf931e41061d6ccdb78295621a53 |