Skip to main content

A toolkit for CARE-SM data transformation.

Project description

CARE-SM Toolkit

CSV datatable toolkit for CARE semantic model implementation

The implementation of the Clinical And Registry Entries (CARE) Semantic Model for CSV data entails a meticulous and technically advanced workflow. By leveraging the power of the CARE-SM, YARRRML templates and incorporating the critical curation step executed by the CARE-SM toolkit, this implementation achieves robustness, accuracy, and reliability in generating RDF-based CDE-oriented patient data.

The toolkit serves as a module dedicated to performing a curation step prior to the conversion of data into RDF. The primary transformations carried out by the toolkit include:

  • Quality control for column names.

  • Adding every domain specific ontological term required to define every instances of the model, these terms are specific for every data element.

  • Splitting the column labeled as value into distinct datatypes. This enables YARRRML to interpret each datatype differently, facilitating the subsequent processing.

  • Conducting a quality control among age/date, startdate and enddate columns to ensure data consistency and validity.

  • Eliminating any row that lacks the minimal required data to minimize the generation of incomplete RDF transformations.

  • Creation of the column called uniqid that assigns a unique identifier to each observation. This prevents the RDF instances from overlapping with one another, ensuring their distinctiveness and integrity.

Requirements

  • In order to use CARE-SM Toolkit functionality:
  • All CSV files MUST be named according the data tags described at the CARE-SM glossary, documented at CARE-SM implementation E.g.: Diagnosis.csv, Birthdate.csv

  • All your CSV data content MUST be compatible with the CARE-SM glossary, documented at CARE-SM implementation

Dockerized implementation

There's a Docker-based implementation controlled via API (using FastAPI) that you can use for mounting this data transformation step as a part of your CARE-SM implementation.

You can edit the docker-compose.yaml to control the volume folder in order to pass your CSV-based patient data:

    volumes:
      - ./location/of/your/data:/code/data

Note IP and Port can be customized in the docker compose as well.

Run docker compose to start the containers:

 docker compose up -d

Once its running, you can use in your browser the OpenAPI documentation at http://localhost:8080/docs so inspect all the possible requests and trigger the execution

Alternatively, you trigger the data transformation in the terminal by the following:

curl -X POST http://localhost:8080/toolkit

Congrats! You will find your transformed data, stored as CARE.csv at the folder you defined as volume below.

To stop and remove the implementation, do the following:

docker compose down

Local implementation

If you are not interested in running our Docker image, you can install the Python module for local implementation.

Installation

Python 3.5 or later is needed. The script depends on standard libraries, plus the ones declared in requirements.txt.

In order to install the dependencies you need pip and venv Python modules. - pip is available in many Linux distributions (Ubuntu package python-pip, CentOS EPEL package python-pip), and also as pip Python package. - venv is also available in many Linux distributions (Ubuntu package python3-venv). In some of these distributions venv is integrated into the Python 3.5 (or later) installation.

The creation of a virtual environment and installation of the dependencies in that environment is done running:

python3 -m venv envCARESM
source envCARESM/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Remember to deactivate your Python environment after using it.

Execution

Then, change the folder path inside the trial.py script. And run it:

python3 trial.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

care_sm_toolkit-0.1.7.tar.gz (9.8 kB view details)

Uploaded Source

File details

Details for the file care_sm_toolkit-0.1.7.tar.gz.

File metadata

  • Download URL: care_sm_toolkit-0.1.7.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.32.4 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for care_sm_toolkit-0.1.7.tar.gz
Algorithm Hash digest
SHA256 ff7a43f58658b3e1c7bd33d4300317b1c527f8a3502a001f7afd0fa118e049ed
MD5 5c2cfe49e8e35fb4444b62134d5d5263
BLAKE2b-256 4f7f69e91060f4ff42c90c468d53a26aca999d90f666ab42fd8aa6bbca9b7caf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page