A toolkit for CARE-SM data transformation.
Project description
CARE-SM Toolkit
CSV datatable toolkit for CARE semantic model implementation
The implementation of the Clinical And Registry Entries (CARE) Semantic Model for CSV data entails a meticulous and technically advanced workflow. By leveraging the power of the CARE-SM, YARRRML templates and incorporating the critical curation step executed by the CARE-SM toolkit, this implementation achieves robustness, accuracy, and reliability in generating RDF-based CDE-oriented patient data.
The toolkit serves as a module dedicated to performing a curation step prior to the conversion of data into RDF. The primary transformations carried out by the toolkit include:
-
Quality control for column names.
-
Adding every domain specific ontological term required to define every instances of the model, these terms are specific for every data element.
-
Splitting the column labeled as
valueinto distinct datatypes. This enables YARRRML to interpret each datatype differently, facilitating the subsequent processing. -
Conducting a quality control among
age/date,startdateandenddatecolumns to ensure data consistency and validity. -
Eliminating any row that lacks the minimal required data to minimize the generation of incomplete RDF transformations.
-
Creation of the column called
uniqidthat assigns a unique identifier to each observation. This prevents the RDF instances from overlapping with one another, ensuring their distinctiveness and integrity.
Requirements
- In order to use CARE-SM Toolkit functionality:
-
All CSV files MUST be named according the data tags described at the CARE-SM glossary, documented at CARE-SM implementation E.g.:
Diagnosis.csv,Birthdate.csv -
All your CSV data content MUST be compatible with the CARE-SM glossary, documented at CARE-SM implementation
Dockerized implementation
There's a Docker-based implementation controlled via API (using FastAPI) that you can use for mounting this data transformation step as a part of your CARE-SM implementation.
You can edit the docker-compose.yaml to control the volume folder in order to pass your CSV-based patient data:
volumes:
- ./location/of/your/data:/code/data
Note IP and Port can be customized in the docker compose as well.
Run docker compose to start the containers:
docker compose up -d
Once its running, you can use in your browser the OpenAPI documentation at http://localhost:8080/docs so inspect all the possible requests and trigger the execution
Alternatively, you trigger the data transformation in the terminal by the following:
curl -X POST http://localhost:8080/toolkit
Congrats! You will find your transformed data, stored as CARE.csv at the folder you defined as volume below.
To stop and remove the implementation, do the following:
docker compose down
Local implementation
If you are not interested in running our Docker image, you can install the Python module for local implementation.
Installation
Python 3.5 or later is needed. The script depends on standard libraries, plus the ones declared in requirements.txt.
In order to install the dependencies you need pip and venv Python modules.
- pip is available in many Linux distributions (Ubuntu package python-pip, CentOS EPEL package python-pip), and also as pip Python package.
- venv is also available in many Linux distributions (Ubuntu package python3-venv). In some of these distributions venv is integrated into the Python 3.5 (or later) installation.
The creation of a virtual environment and installation of the dependencies in that environment is done running:
python3 -m venv envCARESM
source envCARESM/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Remember to deactivate your Python environment after using it.
Execution
Then, change the folder path inside the trial.py script. And run it:
python3 trial.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file care_sm_toolkit-0.1.7.tar.gz.
File metadata
- Download URL: care_sm_toolkit-0.1.7.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.32.4 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff7a43f58658b3e1c7bd33d4300317b1c527f8a3502a001f7afd0fa118e049ed
|
|
| MD5 |
5c2cfe49e8e35fb4444b62134d5d5263
|
|
| BLAKE2b-256 |
4f7f69e91060f4ff42c90c468d53a26aca999d90f666ab42fd8aa6bbca9b7caf
|