MIGHT: MRSN Integrated Genome Handling Tool for bacterial clinical isolates
Project description
MIGHT
MIGHT: MRSN Integrated Genome Handling Tool for bacterial clinical isolates
Contents
Introduction
MIGHT was developed as a way to automate many of the standard bioinformatics tasks that the MRSN performs as part of its surveillance mission.
Brief summary of the workflow:
- Run bcl2fastq to demultiplex Illumina paired-end read data from MiSeq/Nextseq data
- Run Kraken2 to get species ID and identify possible sample contamination
- Preprocess short reads using bbduk for short read data and/or filtlong for long read data
- Run the Unicycler assembler (with or without long read data)
- Run QUAST to gather assembly statistics
- Run Andale, a hybrid read/assembly AMR gene identification tool
Installation
This script is designed to be installed and run using conda
Conda Installation
Usage
MIGHT can be run either on a single isolate using Might.py or on all of the samples of an Illumina run using AllMight.py. The primary difference from an input perspective is that Might.py assumes that you are processing a single sample for which you will provide 1) the sample name and 2) the location(s) or the relevant input files. Conversely, AllMight.py will takes a user provided SampleSheet.csv to determine what samples should be included in the run. It will ultimately run the specified analyses on each sample as parallel implementations of the analysis methods found in Might.py.
For a single isolate:
.___ ___. __ _______ __ __ .__________.
| \/ | | | / _____|| | | | | |
| \ / | | | | | __ | |__| | `---| |---`
| |\/| | | | | | |_ | | __ | | |
| | | | | | | |__| | | | | | | |
|__| |__| |__| \______| |__| |__| |__|
usage: Might.py --output OUTPUT [--sample-name SAMPLE_NAME] [--fastq FASTQ]
[--fasta FASTA] [--all] [--kraken2] [--assembly]
[--amr {combination,reads,contigs,summary}] [--mlst]
[--plasmidfinder] [--kraken2-database KRAKEN2_DATABASE]
[--adapter-file ADAPTER_FILE] [--ramdisk RAMDISK] [--update]
[--force] [--cores CORES] [--verbosity VERBOSITY] [-h]
MIGHT! MRSN Integrated Genome Handling Tool
Required arguments:
--output OUTPUT path to the directory where output is/will be stored
Input arguments:
--sample-name SAMPLE_NAME
Name of the sample to be analyzed.
--fastq FASTQ path to the directory containing the read files for
this sample [output/reads/raw_reads]
--fasta FASTA path to the directory containing the assembly file for
this sample [output/assembly]
Analysis arguments:
--all run all analysis options
--kraken2 run Kraken2 on read files to determine species ID and
potentially detect contamination
--assembly trim and filter reads using bbduk, then perform
assembly using Unicycler
--amr {combination,reads,contigs,summary}
run Andale using one of the four setting choices
--mlst perform MLST assignments for samples using MLST
--plasmidfinder run Plasmidfinder on contig files to identify rep gene
content
Resource arguments:
--kraken2-database KRAKEN2_DATABASE
Path to the kraken2 database. Required for kraken2
analysis
--adapter-file ADAPTER_FILE
Path to the adapter.fa file required for adapter
trimming of Illumina reads
--ramdisk RAMDISK Path to the ramdisk for speeding up kraken2
Optional arguments:
--update update AMRFinderPlus and MLST databases
--force force overwrite of existing data/output related to
this sample
--cores CORES the MAXIMUM number of CPUs to use in the analysis [1]
--verbosity VERBOSITY
the level of reporting done to the terminal window [1]
Help:
-h, --help show this help message and exit
For an Illumina run
.___ ___. __ _______ __ __ .__________.
| \/ | | | / _____|| | | | | |
| \ / | | | | | __ | |__| | `---| |---`
| |\/| | | | | | |_ | | __ | | |
| | | | | | | |__| | | | | | | |
|__| |__| |__| \______| |__| |__| |__|
usage: AllMight.py --output OUTPUT [--bcl2fastq]
[--run-directory RUN_DIRECTORY]
[--sample-sheet SAMPLE_SHEET] [--all] [--kraken2]
[--assembly] [--amr {combination,reads,contigs,summary}]
[--mlst] [--plasmidfinder]
[--kraken2-database KRAKEN2_DATABASE]
[--adapter-file ADAPTER_FILE] [--ramdisk RAMDISK]
[--update] [--force] [--cores CORES]
[--verbosity VERBOSITY] [-h]
MIGHT! MRSN Integrated Genome Handling Tool
Required arguments:
--output OUTPUT path to the directory where output is/will be stored
bcl2fastq2 arguments:
--bcl2fastq Run bcl2fastq2 to generate demultiplexed fastq files
from the bcl files
--run-directory RUN_DIRECTORY
Path to the run directory to be analyzed
--sample-sheet SAMPLE_SHEET
Path to the Illumina sample sheet file for the run
being analyzed
Analysis arguments:
--all run all analysis options
--kraken2 run Kraken2 on read files to determine species ID and
potentially detect contamination
--assembly trim and filter reads using bbduk, then perform
assembly using Unicycler
--amr {combination,reads,contigs,summary}
run Andale using one of the four setting choices
--mlst perform MLST assignments for samples using MLST
--plasmidfinder run Plasmidfinder on contig files to identify rep gene
content
Resource arguments:
--kraken2-database KRAKEN2_DATABASE
Path to the kraken2 database. Required for kraken2
analysis
--adapter-file ADAPTER_FILE
Path to the adapter.fa file required for adapter
trimming of Illumina reads
--ramdisk RAMDISK Path to the ramdisk for speeding up kraken2
Optional arguments:
--update update AMRFinderPlus and MLST databases
--force force overwrite of existing data/output related to
this sample
--cores CORES the MAXIMUM number of CPUs to use in the analysis [1]
--verbosity VERBOSITY
the level of reporting done to the terminal window [1]
Help:
-h, --help show this help message and exit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mrsn-might-1.0.5.tar.gz
.
File metadata
- Download URL: mrsn-might-1.0.5.tar.gz
- Upload date:
- Size: 32.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18562e03c4407b84a27b03a158114332f996f6ceade06dae6a9c5aad0a03f646 |
|
MD5 | 1198409096684778e35d6f4706ddbbb8 |
|
BLAKE2b-256 | 0b075d7836d2ddd4e6e529be0cc3bd187fac362b9e3474640bae1bcfdfc3dcbd |
File details
Details for the file mrsn_might-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: mrsn_might-1.0.5-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d419050369427be3b897a2f03ed9ac4d826bc22e310b3379c58487f9fa729cbc |
|
MD5 | 08d4363cad354f11335ec84e7dd83dc5 |
|
BLAKE2b-256 | 69f797f30a84323806f0c85398ff03e8421ad8ec7782dfe13dc1759f8cae7053 |