Skip to main content

A toolkit for CARE-SM data transformation.

Project description

CARE-SM Toolkit

CSV datatable toolkit for CARE semantic model implementation

The implementation of the Clinical And Registry Entries (CARE) Semantic Model for CSV data entails a meticulous and technically advanced workflow. By leveraging the power of the CARE-SM, YARRRML templates and incorporating the critical curation step executed by the CARE-SM toolkit, this implementation achieves robustness, accuracy, and reliability in generating RDF-based CDE-oriented patient data.

The toolkit serves as a module dedicated to performing a curation step prior to the conversion of data into RDF. The primary transformations carried out by the toolkit include:

  • Quality control for column names.

  • Adding every domain specific ontological term required to define every instances of the model, these terms are specific for every data element.

  • Splitting the column labeled as value into distinct datatypes. This enables YARRRML to interpret each datatype differently, facilitating the subsequent processing.

  • Conducting a quality control among age/date, stardate and enddate columns to ensure data consistency and validity.

  • Eliminating any row that lacks of the minimal required data to minimize the generation of incomplete RDF transformations.

  • Creation of the column called uniqid that assigns a unique identifier to each observation. This prevents the RDF instances from overlapping with one another, ensuring their distinctiveness and integrity.

Dockerized implementation

There's a Docker-based implementation controlled via API (using FastAPI) that you can use for mounting this data transformation step as a part of your CARE-SM implementation. Use our docker compose to control your Docker image, ports where its located and volumes in order to pass your CSV-based patient data:

version: "3.3"

services:
  api:
    image: pabloalarconm/care-sm-toolkit:latest # check for latest version
    ports:
      - "8000:8000"
    volumes:
      - ./data:/code/data

Local implementation

If you are not interested on running Docker image, you can install the Pyhton module for local implementation.

Installation:

pip install CARE-SM-Toolkit

Requirements:

Test:

import pandas as pd
from main import Toolkit

test= Toolkit()

test_done = test.whole_quality_control(input_data="toolkit/exemplar_data/preCARE.csv")
test_done.to_csv ("toolkit/exemplar_data/CARE.csv", index = False, header=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

care_sm_toolkit-0.1.5.tar.gz (9.0 kB view details)

Uploaded Source

File details

Details for the file care_sm_toolkit-0.1.5.tar.gz.

File metadata

  • Download URL: care_sm_toolkit-0.1.5.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.32.3 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for care_sm_toolkit-0.1.5.tar.gz
Algorithm Hash digest
SHA256 4717f97f545a49f3d24f356013fdd698b896740bcde99104abb0a06a5323c95d
MD5 cfe3ed7f2a9ab87bfb8f2eeaedcc83cd
BLAKE2b-256 2771d67d310ede13cba3b0e6b3692191fee320920e708397653928e48b6d83ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page