Skip to main content

Trailblazer is a tool to manage and track state of analyses

Project description

Trailblazer Coverage Status

Automate, monitor, and simplify running the MIP analysis pipeline

Trailblazer is a tool that aims to provide:

  • a Python interface to interact with MIP in an automated fashion
  • a limited command line interface to simplify running MIP using an opinionated setup

Here you can find a simple web UI for Trailblazer that helps you keep track of the status of multiple runs

Todo

  • fetch all job ids from sacct status log and offer to kill all jobs
  • display statistics like which steps most analyses fail at

Roadmap

Trailblazer's scope will be reduced in the next months and will become SLURM+! Meaning it will become a web UI tool that monitors pipelines to help you keep track of the status of multiple analyses.

We have chosen to have a pipeline export the SLURM job ids in the form of a file:

123145
123146
123147
123148

The plan is formulated in issue/39.

Installation

Trailblazer written in Python 3.6+ and is available on the Python Package Index (PyPI).

pip install trailblazer

If you would like to install the latest development version:

git clone https://github.com/Clinical-Genomics/trailblazer
cd trailblazer
pip install --editable .

Files will be blacked automatically with each push to github. If you would like to automatically Black format your commits on your local machine:

pre-commit install

Contributing

Trailblazer is using github flow branching model as described in our development manual.

Documentation

Here's a brief documentation. Trailblazer functionality can be accessed from the command line interface (CLI), the monitoring web interface, the supporting REST API, as well as using the Python API.

Command line interface

Config file

Trailblazer supports a simple config file written in YAML. You can always provide the same option on the command line, however, it's recommended to store some commonly used values in the config.

The following options are supported:

---
database: mysql+pymysql://userName:passWord@domain.com/database
script: /path/to/MIP/mip.pl
mip_config: /path/to/global/MIP_config.yaml

Tip: setup a Bash alias in your ~/.bashrc to always point to your config automatically:

alias trailblazer="trailblazer --config /path/to/trailblazer/config.yaml"

Command: trailblazer init

Setup (or reset) a Trailblazer database. It will simply setup all the tables in the database. You can reset an existing database by using the --reset option.

trailblazer --database "sqlite:///tb.sqlite3" init --reset
Delete existing tables? [analysis, info, job, user] [y/N]: y
Success! New tables: analysis, info, job, user

Command: trailblazer user

This command can be used both to add a new user to the database (and give them access to the web interface) and view information about an existing user.

# add a new user
trailblazer user --name "Paul Anderson" paul.anderson@magnolia.com
New user added: paul.anderson@magnolia.com (2)

# check an existing user
trailblazer user paul.anderson@magnolia.com
{'created_at': datetime.datetime(2017, 6, 22, 8, 49, 44, 685977), 'google_id': None, 'name': 'Paul Anderson', 'email': 'paul.anderson@magnolia.com', 'avatar': None, 'id': 2}

Command: trailblazer log

Logs the status of a run to the supporting database. You need to point to the analysis config of a specific run.

trailblazer log path/to/family/analysis/family_config.yaml

You can point to the same analysis multiple times, Trailblazer will detect if the same analysis has been added before and skip it if no information has been updated. If an analysis has been added previously as "running" or "pending", those entries will automatically be removed as soon as the same analysis is logged as either "completed" or "failed".

Trailblazer will automatically find additional files used for logging the analysis status (family_qc_sample_info.yaml (sampleinfo) and mip.pl_2017-06-17T12:11:42.log.status (sacct)) unless you explicitly point to them using the --sampleinfo and --sacct flags. If either of the files are missing, Trailblazer will simply skip adding a status for that analysis.

Command: trailblazer scan

Convenience command to scan an entire directory structure for all analyses and update their status in one go. Assumes the base directory consists of individual family folders:

trailblazer scan /path/to/analyses/dir/

This command can easily be setup in a crontab to run e.g. every hour and keep the analysis statuses up-to-date!

Command: trailblazer ls

Prints the family id for the most recently completed analyses to the console. This is useful to tie in downstream tools that might want to do something with the data from completed runs.

trailblazer ls
F0013487
F0013362
F0006106
17083
F0013469
17085

Command: trailblazer delete

Deletes an analysis log from the database. The input is the unique analysis id which is printed ones the analysis is initially logged. It's also displayed in the web interface.

trailblazer delete 4

Command: trailblazer start

Start MIP from Trailblazer. It's only a thin wrapper around the MIP command line. It removes some complexity like having to provide the global MIP config if it is already defined in the Trailblazer config. It also logs a started analysis as "pending" until the first job has been completed and the status can be evaluated (creates the sacct status file).

trailblazer start family4 --priority high

Project details


Release history Release notifications | RSS feed

This version

9.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trailblazer-9.1.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

trailblazer-9.1.0-py2.py3-none-any.whl (17.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file trailblazer-9.1.0.tar.gz.

File metadata

  • Download URL: trailblazer-9.1.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for trailblazer-9.1.0.tar.gz
Algorithm Hash digest
SHA256 12fb9fd80fdf034f991886bae6ca4360072de439c80728e6ee2a0b4d31ac10e7
MD5 1f50c020441d8e48f539c1054cb1b345
BLAKE2b-256 5aae520e0ad9e7cd315f303af075cc230b9acbe769f2ca933b89be296955f34d

See more details on using hashes here.

File details

Details for the file trailblazer-9.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: trailblazer-9.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for trailblazer-9.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 71c6126617bf99b86afd3de946150cbcf7c1e01a875202ef555927e1f4032f00
MD5 cdab7e2b1e96f25010dca7408edd6afd
BLAKE2b-256 1352126d6bba600cebf8fab2eb8ee50a5831fab9ad61ed45b0ef301beb3d7491

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page