Skip to main content

A toolkit for ETL curation for the tranSMART data warehouse.

Project description Documentation Status

A toolkit for ETL curation for the tranSMART data warehouse. The TranSMART curation toolkit (tmtk) can be used to edit and validate studies prior to loading them with transmart-batch.

For general documentation visit readthedocs.


Clone the repo

$ git clone
$ cd tmtk

Initialize a virtualenv

$ pip install virtualenv
$ virtualenv -p /path/to/python3.x/installation env
$ source env/bin/activate

For mac users it will most likely be

$ pip install virtualenv
$ virtualenv -p python3 env
$ source env/bin/activate

or do this using virtualenvwrapper.


To install tmtk and all dependencies into your Python environment, and enable the Arborist Jupyter notebook extension, run:

$   pip3 install tmtk


$   pip3 install -r requirements.txt
$   python3 install

or if you want to run the tool from code in development mode:

$   pip3 install -r requirements.txt
$   python3 develop
$   jupyter-nbextension install --py tmtk.arborist
$   jupyter-serverextension enable tmtk.arborist


These dependencies will have to be installed:
  • pandas>=0.19.2
  • ipython>=5.3.0
  • jupyter>=1.0.0
  • jupyter-client>=5.0.0
  • jupyter-core>=4.3.0
  • jupyter-console>=5.1.0
  • notebook>=4.2.0
  • requests>=2.13.0
  • tqdm>=4.11.0
  • mygene>=3.0.0



Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
tmtk-0.3.2.tar.gz (409.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page