Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data
Project description
TBpore
Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data
Table of Contents
Synopsis
tbpore
is a tool with two main goals.
First is to process Nanopore Mycobacterium tuberculosis sequencing data to describe variants with respect to the
canonical TB strain H37Rv and predict antibiotic resistance (command tbpore process
).
Variant description is done by decontaminating reads, calling variants with
bcftools and filtering variants.
Antibiotic resistance is predicted with mykrobe.
Second, tbpore
can be used to cluster TB samples based on their genotyping and a given distance threshold (command
tbpore cluster
).
Installation
conda
Prerequisite: conda
(and bioconda channel correctly set up)
$ conda install tbpore
pip
The python components of tbpore
are availble to install through PyPI.
pip install tbpore
However, you will need to install the following dependencies, which cannot be installed through PyPI.
Dependencies
rasusa
psdm
version 0.1samtools
version 1.13bcftools
version 1.13mykrobe
version ≥ 0.12minimap2
version 2.22seqkit
version 2.0
We make no guarentees about the performance of tbpore
with versions other than those specified above. In particular, the bcftools
version is very important. The latest versions of the other dependencies can likely be used.
Container
Docker images are provided through biocontainers.
singularity
Prerequisite: singularity
$ URI="docker://quay.io/biocontainers/tbpore:<tag>"
$ singularity exec "$URI" tbpore --help
see here for valid values for <tag>
.
docker
Prerequisite: Docker
$ docker pull quay.io/biocontainers/tbpore:<tag>
$ docker run quay.io/biocontainers/tbpore:<tag> tbpore --help
see here for valid values for <tag>
.
Configuring the decontamination database index
When you run your first tbpore process
, you will get this error:
ERROR | Decontamination DB index tbpore/data/decontamination_db/tbpore.remove_contam.fa.gz.map-ont.mmi does not
exist, please follow the instructions at https://github.com/mbhall88/tbpore#configuring-the-decontamination-database-index
to download and configure it before running tbpore
This means you need to download the minimap2 decontamination database index before proceeding. You can download this index here or by running:
wget https://figshare.com/ndownloader/files/36708444 -O tbpore.remove_contam.fa.gz.map-ont.mmi.gz
Once the download is complete, you can:
- Ensure that the compressed index was transferred correctly by checking its
md5sum
:
md5sum tbpore.remove_contam.fa.gz.map-ont.mmi.gz
82d050e0f1cba052f0c94f16fcb32f7b tbpore.remove_contam.fa.gz.map-ont.mmi.gz
- Decompress the index:
gunzip tbpore.remove_contam.fa.gz.map-ont.mmi.gz
- Check the md5sum of the decompressed index:
md5sum tbpore.remove_contam.fa.gz.map-ont.mmi
810c5c09eaf9421128e4e52cdf2fa32a tbpore.remove_contam.fa.gz.map-ont.mmi
- Move the decompressed index to
<tbpore_dir>/data/decontamination_db/tbpore.remove_contam.fa.gz.map-ont.mmi
- Note: you can also keep this index at a different path and specify it to
tbpore
using the--db
option;
- Note: you can also keep this index at a different path and specify it to
Once these four steps above are done, you should be able to run tbpore
on an example isolate by going into the
tbpore
dir and running:
just test-run
Performance
tbpore process
Benchmarked on 151 TB ONT samples with 1 thread:
- Runtime:
2103
s avg,4048
s max (s = seconds); - RAM:
12.4
GB avg,13.1
GB max (GB = Gigabytes);
tbpore cluster
Clustering 151 TB ONT samples:
- Runtime:
286
s; - RAM:
<1
GB;
Usage
General usage
Usage: tbpore [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-v, --verbose Turns on debug-level logger. Option is mutually exclusive
with quiet.
-q, --quiet Turns off all logging except errors. Option is mutually
exclusive with verbose.
Commands:
cluster Cluster consensus sequences
process Single-sample TB genomic analysis from Nanopore sequencing data
process subcommand
Usage: tbpore process [OPTIONS] [INPUTS]...
Single-sample TB genomic analysis from Nanopore sequencing data
INPUTS: Fastq file(s) and/or a directory containing fastq files. All files
will be joined into a single fastq file, so ensure they're all part of the
same sample/isolate.
Options:
-h, --help Show this message and exit.
-r, --recursive Recursively search INPUTS for fastq files
-S, --name TEXT Name of the sample. By default, will use the
first INPUT file with any extensions
stripped
-A, --report_all_mykrobe_calls Report all mykrobe calls (turn on flag -A,
--report_all_calls when calling mykrobe)
--db PATH Path to the decontaminaton database
[default: /Users/michaelhall/Projects/tbpore
/data/decontamination_db/tbpore.remove_conta
m.fa.gz.map-ont.mmi]
-m, --metadata PATH Path to the decontaminaton database metadata
file [default: /Users/michaelhall/Projects/
tbpore/data/decontamination_db/remove_contam
.tsv.gz]
-o, --outdir DIRECTORY Directory to place output files [default:
.]
--tmp DIRECTORY Specify where to write all (tbpore)
temporary files. [default: <outdir>/.tbpore]
-t, --threads INTEGER Number of threads to use in multithreaded
tools [default: 1]
-d, --cleanup / -D, --no-cleanup
Remove all temporary files on *successful*
completion [default: no-cleanup]
--cache DIRECTORY Path to use for the cache [default:
/Users/michaelhall/.cache]
cluster subcommand
Usage: tbpore cluster [OPTIONS] [INPUTS]...
Cluster consensus sequences
Preferably input consensus sequences previously generated with tbpore
process.
INPUTS: Two or more consensus fasta sequences. Use glob patterns to input
several easily (e.g. output/sample_*/*.consensus.fa).
Options:
-h, --help Show this message and exit.
-T, --threshold INTEGER Clustering threshold [default: 6]
-o, --outdir DIRECTORY Directory to place output files [default:
.]
--tmp DIRECTORY Specify where to write all (tbpore)
temporary files. [default: <outdir>/.tbpore]
-t, --threads INTEGER Number of threads to use in multithreaded
tools [default: 1]
-d, --cleanup / -D, --no-cleanup
Remove all temporary files on *successful*
completion [default: no-cleanup]
--cache DIRECTORY Path to use for the cache [default:
/Users/michaelhall/.cache]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tbpore-0.3.0.tar.gz
.
File metadata
- Download URL: tbpore-0.3.0.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59c89d9bf789d461271d9d01a298a264bbdb3ff747cc669f95bc5fc8b0cf5bc5 |
|
MD5 | 7dc0b56fbf8d7a27e776dc3da01dd991 |
|
BLAKE2b-256 | f4e4aa4e935736ff8a7ca03529121ed1c5ee6e24a4ffbd0145f0524b94d4b7d6 |
File details
Details for the file tbpore-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: tbpore-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6db5bb18eb5800e219d86179f1915a43ab5e8adfaf58f71699f01f63af118eaf |
|
MD5 | d1b630ffba575a51386e6cb8e970f717 |
|
BLAKE2b-256 | 75d39722d2aa03d7ed9ff92cde7bae2427b6655cd030c3cefe2bab3f7a1573cf |