Skip to main content

Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data

Project description

TBpore

Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data

Python CI codecov PyPI PyPI - Python Version License: MIT Code style: black

Table of Contents

Synopsis

tbpore is a tool with two main goals. First is to process Nanopore Mycobacterium tuberculosis sequencing data to describe variants with respect to the canonical TB strain H37Rv and predict antibiotic resistance (command tbpore process). Variant description is done by decontaminating reads, calling variants with bcftools and filtering variants. Antibiotic resistance is predicted with mykrobe. Second, tbpore can be used to cluster TB samples based on their genotyping and a given distance threshold (command tbpore cluster).

Installation

conda

Conda (channel only) bioconda version Conda

Prerequisite: conda (and bioconda channel correctly set up)

$ conda install tbpore

pip

PyPI PyPI - Python Version

The python components of tbpore are availble to install through PyPI.

pip install tbpore

However, you will need to install the following dependencies, which cannot be installed through PyPI.

Dependencies

We make no guarentees about the performance of tbpore with versions other than those specified above. In particular, the bcftools version is very important. The latest versions of the other dependencies can likely be used.

Container

Docker images are provided through biocontainers.

singularity

Prerequisite: singularity

$ URI="docker://quay.io/biocontainers/tbpore:<tag>"
$ singularity exec "$URI" tbpore --help

see here for valid values for <tag>.

docker

Docker Repository on Quay

Prerequisite: Docker

$ docker pull quay.io/biocontainers/tbpore:<tag>
$ docker run quay.io/biocontainers/tbpore:<tag> tbpore --help

see here for valid values for <tag>.

Configuring the decontamination database index

When you run your first tbpore process, you will get this error:

ERROR    | Decontamination DB index tbpore/data/decontamination_db/tbpore.remove_contam.fa.gz.map-ont.mmi does not
exist, please follow the instructions at https://github.com/mbhall88/tbpore#configuring-the-decontamination-database-index
to download and configure it before running tbpore

This means you need to download the minimap2 decontamination database index before proceeding. You can download this index here or by running:

wget https://figshare.com/ndownloader/files/36708444 -O tbpore.remove_contam.fa.gz.map-ont.mmi.gz

Once the download is complete, you can:

  1. Ensure that the compressed index was transferred correctly by checking its md5sum:
md5sum tbpore.remove_contam.fa.gz.map-ont.mmi.gz
82d050e0f1cba052f0c94f16fcb32f7b  tbpore.remove_contam.fa.gz.map-ont.mmi.gz
  1. Decompress the index:
gunzip tbpore.remove_contam.fa.gz.map-ont.mmi.gz
  1. Check the md5sum of the decompressed index:
md5sum tbpore.remove_contam.fa.gz.map-ont.mmi
810c5c09eaf9421128e4e52cdf2fa32a  tbpore.remove_contam.fa.gz.map-ont.mmi
  1. Move the decompressed index to <tbpore_dir>/data/decontamination_db/tbpore.remove_contam.fa.gz.map-ont.mmi
    • Note: you can also keep this index at a different path and specify it to tbpore using the --db option;

Once these four steps above are done, you should be able to run tbpore on an example isolate by going into the tbpore dir and running:

just test-run

Performance

tbpore process

Benchmarked on 151 TB ONT samples with 1 thread:

  • Runtime: 2103s avg, 4048s max (s = seconds);
  • RAM: 12.4GB avg, 13.1GB max (GB = Gigabytes);

tbpore cluster

Clustering 151 TB ONT samples:

  • Runtime: 286s;
  • RAM: <1GB;

Usage

General usage

Usage: tbpore [OPTIONS] COMMAND [ARGS]...

Options:
  -h, --help     Show this message and exit.
  -V, --version  Show the version and exit.
  -v, --verbose  Turns on debug-level logger. Option is mutually exclusive
                 with quiet.
  -q, --quiet    Turns off all logging except errors. Option is mutually
                 exclusive with verbose.

Commands:
  cluster  Cluster consensus sequences
  process  Single-sample TB genomic analysis from Nanopore sequencing data

process subcommand

Usage: tbpore process [OPTIONS] [INPUTS]...

  Single-sample TB genomic analysis from Nanopore sequencing data

  INPUTS: Fastq file(s) and/or a directory containing fastq files. All files
  will be joined into a single fastq file, so ensure they're all part of the
  same sample/isolate.

Options:
  -o, --outdir DIRECTORY          Directory to place output files  [default:
                                  tbpore_out]
  -r, --recursive                 Recursively search INPUTS for fastq files
  --tmp DIRECTORY                 Specify where to write all (tbpore)
                                  temporary files. [default: <outdir>/.tbpore]
  -S, --name TEXT                 Name of the sample. By default, will use the
                                  first INPUT file with any extensions
                                  stripped
  -t, --threads INTEGER           Number of threads to use in multithreaded
                                  tools  [default: 1]
  -A, --report_all_mykrobe_calls  Report all mykrobe calls (turn on flag -A,
                                  --report_all_calls when calling mykrobe)
  -d, --cleanup / -D, --no-cleanup
                                  Remove all temporary files on *successful*
                                  completion  [default: no-cleanup]
  --db PATH                       Path to the decontaminaton database
                                  [default: <project_root_dir>/data
                                  /decontamination_db
                                  /tbpore.remove_contam.fa.gz.map-ont.mmi]
  -m, --metadata PATH             Path to the decontaminaton database metadata
                                  file [default: <project_root_dir>/data
                                  /decontamination_db/remove_contam.tsv.gz]
  --help                          Show this message and exit.

cluster subcommand

Usage: tbpore cluster [OPTIONS] [INPUTS]...

  Cluster consensus sequences

  Preferably input consensus sequences previously generated with tbpore
  process.

  INPUTS: Two or more consensus fasta sequences. Use glob patterns to input
  several easily (e.g. output/sample_*/*.consensus.fa).

Options:
  -T, --threshold INTEGER         Clustering threshold  [default: 6]
  -o, --outdir DIRECTORY          Directory to place output files  [default:
                                  cluster_out]
  --tmp DIRECTORY                 Specify where to write all (tbpore)
                                  temporary files. [default: <outdir>/.tbpore]
  -t, --threads INTEGER           Number of threads to use in multithreaded
                                  tools  [default: 1]
  -d, --cleanup / -D, --no-cleanup
                                  Remove all temporary files on *successful*
                                  completion  [default: no-cleanup]
  --help                          Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tbpore-0.2.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

tbpore-0.2.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file tbpore-0.2.0.tar.gz.

File metadata

  • Download URL: tbpore-0.2.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for tbpore-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5fb605b62f466e4d3fa079166f01256f701796d6ddcd2314e34cca5ac99f8290
MD5 eefe5cba8e0792b6ab5cc89f950c4caf
BLAKE2b-256 d2b8f421c02f00569a44ce01d75f9874fd49b4fe495dc1d50911670500ec1468

See more details on using hashes here.

File details

Details for the file tbpore-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tbpore-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for tbpore-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c47c80bb379c80356ae16b81be008297a9d5033013ed09dc834d780b6c42431f
MD5 547d7cb24ccdb02767236e7d84d5b925
BLAKE2b-256 8754306d11fa4510634739041668444bad66b6b8ab2e39269dab156b87a04fc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page