Skip to main content

MIGHT: MRSN Integrated Genome Handling Tool for bacterial clinical isolates

Project description

MIGHT

MIGHT: MRSN Integrated Genome Handling Tool for bacterial clinical isolates

Contents

Introduction

MIGHT was developed as a way to automate many of the standard bioinformatics tasks that the MRSN performs as part of its surveillance mission.

Brief summary of the workflow:

  1. Run bcl2fastq to demultiplex Illumina paired-end read data from MiSeq/Nextseq data
  2. Run Kraken2 to get species ID and identify possible sample contamination
  3. Preprocess short reads using bbduk for short read data and/or filtlong for long read data
  4. Run the Unicycler assembler (with or without long read data)
  5. Run QUAST to gather assembly statistics
  6. Run Andale, a hybrid read/assembly AMR gene identification tool

Installation

This script is designed to be installed and run using conda

Conda Installation

Usage

MIGHT can be run either on a single isolate using Might.py or on all of the samples of an Illumina run using AllMight.py. The primary difference from an input perspective is that Might.py assumes that you are processing a single sample for which you will provide 1) the sample name and 2) the location(s) or the relevant input files. Conversely, AllMight.py will takes a user provided SampleSheet.csv to determine what samples should be included in the run. It will ultimately run the specified analyses on each sample as parallel implementations of the analysis methods found in Might.py.

For a single isolate:




          .___  ___.  __    _______  __    __  .__________.
          |   \/   | |  |  /  _____||  |  |  | |          |
          |  \  /  | |  | |  |  __  |  |__|  | `---|  |---`
          |  |\/|  | |  | |  | |_ | |   __   |     |  |     
          |  |  |  | |  | |  |__| | |  |  |  |     |  |     
          |__|  |__| |__|  \______| |__|  |__|     |__|     



usage: Might.py --output OUTPUT [--sample-name SAMPLE_NAME] [--fastq FASTQ]
              [--fasta FASTA] [--all] [--kraken2] [--assembly]
              [--amr {combination,reads,contigs,summary}] [--mlst]
              [--plasmidfinder] [--kraken2-database KRAKEN2_DATABASE]
              [--adapter-file ADAPTER_FILE] [--ramdisk RAMDISK] [--update]
              [--force] [--cores CORES] [--verbosity VERBOSITY] [-h]

MIGHT! MRSN Integrated Genome Handling Tool

Required arguments:
--output OUTPUT       path to the directory where output is/will be stored

Input arguments:
--sample-name SAMPLE_NAME
                      Name of the sample to be analyzed.
--fastq FASTQ         path to the directory containing the read files for
                      this sample [output/reads/raw_reads]
--fasta FASTA         path to the directory containing the assembly file for
                      this sample [output/assembly]

Analysis arguments:
--all                 run all analysis options
--kraken2             run Kraken2 on read files to determine species ID and
                      potentially detect contamination
--assembly            trim and filter reads using bbduk, then perform
                      assembly using Unicycler
--amr {combination,reads,contigs,summary}
                      run Andale using one of the four setting choices
--mlst                perform MLST assignments for samples using MLST
--plasmidfinder       run Plasmidfinder on contig files to identify rep gene
                      content

Resource arguments:
--kraken2-database KRAKEN2_DATABASE
                      Path to the kraken2 database. Required for kraken2
                      analysis
--adapter-file ADAPTER_FILE
                      Path to the adapter.fa file required for adapter
                      trimming of Illumina reads
--ramdisk RAMDISK     Path to the ramdisk for speeding up kraken2

Optional arguments:
--update              update AMRFinderPlus and MLST databases
--force               force overwrite of existing data/output related to
                      this sample
--cores CORES         the MAXIMUM number of CPUs to use in the analysis [1]
--verbosity VERBOSITY
                      the level of reporting done to the terminal window [1]

Help:
-h, --help            show this help message and exit

For an Illumina run



            .___  ___.  __    _______  __    __  .__________.
            |   \/   | |  |  /  _____||  |  |  | |          |
            |  \  /  | |  | |  |  __  |  |__|  | `---|  |---`
            |  |\/|  | |  | |  | |_ | |   __   |     |  |     
            |  |  |  | |  | |  |__| | |  |  |  |     |  |     
            |__|  |__| |__|  \______| |__|  |__|     |__|     



usage: AllMight.py --output OUTPUT [--bcl2fastq]
                   [--run-directory RUN_DIRECTORY]
                   [--sample-sheet SAMPLE_SHEET] [--all] [--kraken2]
                   [--assembly] [--amr {combination,reads,contigs,summary}]
                   [--mlst] [--plasmidfinder]
                   [--kraken2-database KRAKEN2_DATABASE]
                   [--adapter-file ADAPTER_FILE] [--ramdisk RAMDISK]
                   [--update] [--force] [--cores CORES]
                   [--verbosity VERBOSITY] [-h]

MIGHT! MRSN Integrated Genome Handling Tool

Required arguments:
  --output OUTPUT       path to the directory where output is/will be stored

bcl2fastq2 arguments:
  --bcl2fastq           Run bcl2fastq2 to generate demultiplexed fastq files
                        from the bcl files
  --run-directory RUN_DIRECTORY
                        Path to the run directory to be analyzed
  --sample-sheet SAMPLE_SHEET
                        Path to the Illumina sample sheet file for the run
                        being analyzed

Analysis arguments:
  --all                 run all analysis options
  --kraken2             run Kraken2 on read files to determine species ID and
                        potentially detect contamination
  --assembly            trim and filter reads using bbduk, then perform
                        assembly using Unicycler
  --amr {combination,reads,contigs,summary}
                        run Andale using one of the four setting choices
  --mlst                perform MLST assignments for samples using MLST
  --plasmidfinder       run Plasmidfinder on contig files to identify rep gene
                        content

Resource arguments:
  --kraken2-database KRAKEN2_DATABASE
                        Path to the kraken2 database. Required for kraken2
                        analysis
  --adapter-file ADAPTER_FILE
                        Path to the adapter.fa file required for adapter
                        trimming of Illumina reads
  --ramdisk RAMDISK     Path to the ramdisk for speeding up kraken2

Optional arguments:
  --update              update AMRFinderPlus and MLST databases
  --force               force overwrite of existing data/output related to
                        this sample
  --cores CORES         the MAXIMUM number of CPUs to use in the analysis [1]
  --verbosity VERBOSITY
                        the level of reporting done to the terminal window [1]

Help:
  -h, --help            show this help message and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrsn-might-1.0.5.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

mrsn_might-1.0.5-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file mrsn-might-1.0.5.tar.gz.

File metadata

  • Download URL: mrsn-might-1.0.5.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for mrsn-might-1.0.5.tar.gz
Algorithm Hash digest
SHA256 18562e03c4407b84a27b03a158114332f996f6ceade06dae6a9c5aad0a03f646
MD5 1198409096684778e35d6f4706ddbbb8
BLAKE2b-256 0b075d7836d2ddd4e6e529be0cc3bd187fac362b9e3474640bae1bcfdfc3dcbd

See more details on using hashes here.

File details

Details for the file mrsn_might-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: mrsn_might-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for mrsn_might-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d419050369427be3b897a2f03ed9ac4d826bc22e310b3379c58487f9fa729cbc
MD5 08d4363cad354f11335ec84e7dd83dc5
BLAKE2b-256 69f797f30a84323806f0c85398ff03e8421ad8ec7782dfe13dc1759f8cae7053

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page