transmart-packer

Data transformation jobs for TranSMART

These details have been verified by PyPI

Maintainers

alepev ewelina gijskant jochemb spayralbe

These details have not been verified by PyPI

Project links

Homepage

Project description

Run data transformation jobs for TranSMART.

Install

First make virtual environment to install dependencies using Python 3.7+

pip install transmart-packer

Or from source:

git clone https://github.com/thehyve/transmart-packer.git
cd transmart-packer
pip install .

Dependencies

a Redis server running on localhost (or update packer/config.py)

Running

From root dir run:

redis-server

celery -A packer.tasks worker --loglevel=info

transmart-packer

Environment variables:

Variable	Description
TRANSMART_URL	The URL of the TranSMART API server
KEYCLOAK_SERVER_URL	Keycloak server URL, e.g., https://keycloak-dwh-test.thehyve.net/auth
KEYCLOAK_REALM	The Keycloak realm (default: transmart)
KEYCLOAK_CLIENT_ID	The Keycloak client ID (default: transmart-client)
KEYCLOAK_OFFLINE_TOKEN	The Keycloak offline token.
REDIS_URL	Redis server URL (default: redis://localhost:6379)
DATA_DIR	Directory to write export data (default: /tmp/packer/)
LOG_CFG	Logging configuration (default: packer/logging.yaml)
CLIENT_ORIGIN_URL	URLs to restrict cross-origin requests to (CORS) (default: *)

An optional variable VERIFY_CERT can be used to specify the path of a certificate collection file (.pem) used to verify HTTP requests.

KEYCLOAK_OFFLINE_TOKEN should be generated for a system user that has the following roles:

realm role offline_access – to be able to get the offline token.
client role impersonation on the realm-management client – to support running tranSMART queries on behalf of task users.

To get the token, run:

KEYCLOAK_CLIENT_ID=transmart-client
SYSTEM_USERNAME=system
SYSTEM_PASSWORD=choose-a-strong-system-password # CHANGE ME
KEYCLOAK_SERVER_URL=https://keycloak.example.com/auth
KEYCLOAK_REALM=example
curl -f --no-progress-meter \
  -d "client_id=${KEYCLOAK_CLIENT_ID}" \
  -d "username=${SYSTEM_USERNAME}" \
  -d "password=${SYSTEM_PASSWORD}" \
  -d "grant_type=password" \
  -d "scope=offline_access" \
  "${KEYCLOAK_SERVER_URL}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/token" | jq -r '.refresh_token'

The value of the refresh_token field in the response is the offline token.

To run the stack using docker-compose follow the commands below:

# Downloads redis image and creates image with project dependencies.
docker-compose build

# After build is complete, start via:
docker-compose up

On code change the webserver will automatically restart, but the Celery workers will not. After updating the Celery task logic, you will need to restart the Docker container.

Usage

Available handlers:

Path	Description
GET /jobs	List all known jobs for this user.
POST /jobs/create	Create a new job by providing job_type and job_parameters, creates the job and returns a task_id.
GET /jobs/status/<task_id>	Get status details for a specific task.
GET /jobs/cancel/<task_id>	Cancel scheduled or abort a running task.
GET /jobs/data/<task_id>	Download the data that this task produced.
WS /jobs/subscribe	Open websocket connection to get live updates on job progress.

To start the toy job “add” on the localhost machine make call to http://localhost:8999/jobs/create?job_type=add&job_parameters={%22x%22:500,%22y%22:1501}.

Development

Components

Testing

To run the test suite, we have to start redis-server and celery workers with the commands above. Then you can run:

python setup.py test

tests/csr_observation.json - test data retrieved from TranSMART using the following API call:

curl -X POST -H 'Content-type: application/json' -H 'Accept: application/json' -d \
'{
    "type":"clinical",
    "constraint": {
        "type":"study_name",
        "studyId":"CSR"
    }
}' \
'<transmart_api_url>/v2/observations'

Current file is created based on clinical test data of python_csr2transmart, with ontology_config.json and sources_config.json as configuration. Note! Do not change csr_observation.json file manually.

Extending

New jobs can be added by adding a new Celery function to the jobs folder and adding the function to the jobs registry. See the packer/jobs/example.py to learn how.

Existing jobs

Basic export job

Export transmart api client observation dataframe to tsv file

{
    "job_type":"basic_export",
    "job_parameters": {
        "constraint": {
            "type":"study_name",
            "studyId":"CSR"

        },
        "custom_name":"name of the export"
    }
}

CSR export

The Central Subject Registry (CSR) data model specific export. The model contains individual, diagnosis, biosource, biomaterial, radiology and study entities, following the hierarchy for sample data: patient > diagnosis > biosource > biomaterial. Studies are orthogonal to samples, i.e., patients are linked to studies independent of samples. Radiology, same as samples, is linked to patient, but can be also linked to diagnosis (optional). The entities IDs are first 6 columns of the export file. The rest of the columns are concepts. Higher level concepts (e.g., Age that is specific to Patient level) get distributed to all rows specific to lower levels (e.g. Diagnosis)

See the CSR test study as an example or latest sources dataset that can be used for e2e testing.

{
    "job_type":"csr_export",
    "job_parameters": {
        "constraint": {
            "type":"study_name",
            "studyId":"CSR"

        },
        "custom_name":"name of the export",
        "row_filter": {
            "type":"patient_set",
            "subjectIds": ["P2", "P6"]
        }
    }
}

where:

job_parameters.constraint - any transmart v2 api constraint or composition of them that used to get data from transmart.
job_parameters.custom_name (optional) - name of the export job and the output tsv file.
job_parameters.row_filter (optional) - any transmart v2 api constraint or composition of them to fetch keys ([[[[patient], diagnosis], biosource], biomaterial]) that will make it to the end result. E.g., given the CSR study and query above only rows specific to P2 and P6 patients will end up to the result table such as P2, D2, BS2, BM2, … row. Please note that keys do not have to be equal in length. A row gets also selected if only part of keys matches. e.g. P1 vs P1, D1

Adding new entity to CSR data model:

When the CSR data model is extended with new sample related entities, the export transformation code has to be changed as well in order to include a column with the ID of the new entity as one of the identifying columns.

In order to do this, packer/table_transformations/csr_transformations.py file has to be modified. The ID_COLUMN_MAPPING map needs to be extended with the new dimension name of the new entity as key and the column name that should appear in the export as value.

If the new entity is not a part of the sample hierarchy, but only linked to one or more entities, the merging logic has to be added in transform_obs_df function in packer/table_transformations/csr_transformations.py (see the example of Radiology and Sample entities).

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details

These details have been verified by PyPI

Maintainers

alepev ewelina gijskant jochemb spayralbe

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.7.1

Jul 13, 2022

0.7.0

May 23, 2022

0.6.1

Jan 26, 2022

0.6.0

Jan 11, 2022

0.5.0

Apr 21, 2021

0.4.4

Apr 21, 2021

0.4.3

Jul 8, 2020

0.4.2

Nov 26, 2019

0.4.1

Nov 26, 2019

0.4.0

Nov 26, 2019

0.3.0

Oct 18, 2019

0.2.1

Sep 23, 2019

0.2.0

Sep 23, 2019

0.1.4

Aug 27, 2019

0.1.3

May 14, 2019

0.1.2

May 14, 2019

0.1.1

May 14, 2019

0.1.0

May 14, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmart-packer-0.7.1.tar.gz (34.1 kB view details)

Uploaded Jul 13, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transmart_packer-0.7.1-py3-none-any.whl (35.5 kB view details)

Uploaded Jul 13, 2022 Python 3

File details

Details for the file transmart-packer-0.7.1.tar.gz.

File metadata

Download URL: transmart-packer-0.7.1.tar.gz
Upload date: Jul 13, 2022
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.1

File hashes

Hashes for transmart-packer-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`f0b6ca52578dc7dfa1ac1d6a01b6541aac0e799dcb4aa59afb13b7509a0a8e2b`
MD5	`4e5f6259e5bb8ec1738a12c1399a2258`
BLAKE2b-256	`c4a116c46336ccd739ec08d823557773396f9d9f2d2a281411a040ba0bcbb798`

See more details on using hashes here.

File details

Details for the file transmart_packer-0.7.1-py3-none-any.whl.

File metadata

Download URL: transmart_packer-0.7.1-py3-none-any.whl
Upload date: Jul 13, 2022
Size: 35.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.7.1

File hashes

Hashes for transmart_packer-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be2c84251320ee953f2fb44e2418f1cc8c046eabcfbe861e4206c76313aa4703`
MD5	`d038121c1efb26d836477bad960c01f2`
BLAKE2b-256	`0ae7b1c8b1c8f7c656939e16348a57a200c4bfaca273b6acee1d8b7c2ce458aa`

See more details on using hashes here.

transmart-packer 0.7.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Dependencies

Running

Usage

Development

Components

Testing

Extending

Existing jobs

Basic export job

CSR export

Adding new entity to CSR data model:

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes