Skip to main content

Variant detection for AMR prediction

Project description

mutAMR

Motivation

Why oh why another variant detection tool? I hear you ask. There are many high quality tools for reporting of variants from microbial paired-end sequencing, including but not limited to snippy and gatk. If you require SNP calling for phylogentics or core genome analysis I recommend that you use these tools.

However, there are cases where a simple vcf is all that is required, in particular for use in identification of acquired AMR mechanims. In addition, many tools which identify SNPs or deletions for AMR are part of large scale tools which, whilst are high quality and extremely useful, they can be complex to install, due to dependencies and run.

mutAMR was written to address a very simple need - generation of a single file as output that can be used for identification of variant for AMR. It is designed to be a very lightweight tool - that simply and specifically generates a vcf file from paired-end illumina reads. It is a stripped down tool - using bwa-mem, freebayes, delly and samtools and is inspired by snippy.

Further functions may be introduced overtime (for example variant calling amplicon based sequencing or for specific genes) if others do not write a more useful tool!!

What mutAMR is NOT?

mutAMR is NOT a

  • Pipeline to provide hands off interpretations for AMR. If you require a tool like that - please use tbtAMR, mykrobe or TB-Profiler
  • Tool designed for generation of alignments suitable for phylogenetic or core-genome analysis. If you require a tool like this - please use snippy

Assumptions

When designing mutAMR I have made some assumptions about the setup, inputs and user requirements.

  1. Paired-end fastq files

    a. It is assumed that these reads are generated from the species from which you supply a reference genome.

    b. That the reads are of sufficient quality for generation of alignments

  2. The user does not want to retain any intermediary files, such as .bam (you can retain these files if you like - see below for how to use).

  3. If running from the commandline mutAMR is being run on a per-sample basis. If you want to run it on more than one sample:

    a. Use a workflow language such as nextflow or snakemake - recommended.

    b. Use parallel (see below for suggested format).

    c. Use a for-loop to iterate over your collection.

  4. delly is installed properly and you want to detect large deletions in your sequences. If not - only small deletions will be detected by freebayes - which is capable of accurately recovering deletions up to ~50-75 bp.

  5. snpEff is installed properly with available configs. If not - no annotation will occur, you will need to annotate your vcf separately.

Dependencies

mutAMR is a python package that runs

  • bwa-mem to align reads to reference genome
  • freebayes to identify variants. Note variants will be identified down to the minimum fraction designated by the user (default 0.1), see Running mutAMR.
  • If installed, delly will be used to identify large deletions. If not installed - then small deletions will be reported as detected by freebayes. A combined vcf file will be generated, combining the variants detected by freebayes and delly.
  • Annotation will be undertaken using snpEff, to allow for simple integration with the WHO M. tuberculosis catalogue V2.

Validation

mutAMR has been validated for detection of SNPS in M. tuberculosis for the purposes of AMR mechanism detection. The default settings for mutAMR

  • Default min depth for base calling in mutAMR is 20 reads. This is higher than what is more commonly used (10 reads). This is because when calling lower frequency mutations (<90% allele frequency), using 10 reads resulted in more false positive variant detection. Which can potentially lead to false calling of resistance.

  • Min allele frequency is set to 0.1 in order to capture low frequency mutations. Allele frequencies lower than 0.1 also resulted in false postive SNP.

Validation results are published TBC.

Installation

Conda - recommended

It is highly recommended to install mutAMR using conda in order to prevent dependency clashes and other issues that may arise - especially if using a share computing resource.

mutAMR can be installed as a conda package with all dependencies.

conda create -n mutamr mutamr

Or you can download the environment.yml file from the root of this repository and

conda env create -f environment.yml

Manual installation

At a minimum you need to make sure that the required dependencies have been installed. The versions specified below have all been confirmed to work together and not cause any installation issues or unexpected behaviour. If you decide to use other versions - please be aware that behaviour may not be as described. For example samtools version 1.21 can cause issues, whilst version 1.20 does not.

Required

  • python ==3.10
  • samtools ==1.20
  • bcftools ==1.20
  • freebayes ==1.3.8
  • bwa mem ==0.7.18

Optional

  • delly ==1.2.8
  • snpEff ==5.2

Using mutAMR

mutAMR can be used from the commandline - or as an importable package to run as part of another python package.

Import

Below is an example of using mutAMR as part of an another python script or tool

Required arguments

  • read1
  • read2

Optional

  • reference (if you are using mutAMR for M. tuberculosis you can simple set mtb=True no need to use this argument)
    • in fasta format
  • annotation this is the species for snpEff (if you are using mutAMR for M. tuberculosis you can simple set mtb=True no need to use this argument)
  • threads
    • default = 8
  • ram
    • default = 8
  • keep - boolean argument - if you would like to keep all intermediary file (inlcuding bam) set to True
    • default = False
  • mtb - boolean argument - if set to True reference and annotation species will be automatically set
    • default = False
  • mindepth - the minimum depth required for base calling. The default is higher than standard (10) to improve the performance of base calling at low minfrac
    • default = 20
  • minfrac - the lowest allele frequency to call a SNP.
    • defaul = 0.1
  • force - if the output folder already exists - mutAMR will stop to prevent accidental overwriting of data. If you would like to override existing mutAMR outputs set force=True
    • default = False
  • tmp - the tmp directory for samtools - use of this can improve performance
    • default = /tmp/username

Example for running M. tuberculosis

from mutamr import Fastq2vcf

read1 = "/path/read1.fastq.gz"
read1 = "/path/read2.fastq.gz"
seq_id= "sample_name"
mtb = True
keep = True

V = Fastq2vcf.Fastq2Vcf(
                read1 = read1,
                read2= read2,
                seq_id= seq_id,
                keep = keep,
                mtb = mtb
                )
vcf = V.run()

This will generate a vcf file at sample_name/sample_name.annot.vcf.gz, the variable vcf is a string and contains the path to the vcf file for input into other tools or functions.

CLI

mutAMR can also be run from the commandline

Example for running M. tuberculosis

mutamr wgs -1 /path/read1.fastq.gz -2 /path/read2.fastq.gz -s sample_name --mtb --keep

This will generate the same files as above, keeping all intermediary files.

parallel

As stated above ideally if you would like to run mutAMR on a batch of sequences you can use a workflow language or parallel. An example of parallel is below.

parallel --colsep '\t' -j 8 mutamr wgs -1 {2} -2 {3} -s {1} --mtb :::: reads.txt

where reads.txt is a tab-delimited file containing 3 columns

  1. Sequence ID
  2. Path to R1
  3. Path to R2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutamr-0.0.2.tar.gz (13.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutamr-0.0.2-py3-none-any.whl (13.6 MB view details)

Uploaded Python 3

File details

Details for the file mutamr-0.0.2.tar.gz.

File metadata

  • Download URL: mutamr-0.0.2.tar.gz
  • Upload date:
  • Size: 13.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for mutamr-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bc30b4a3edf8e796b5c01c5e1888aa8a04be8140f89faa74123990853d6a1dc8
MD5 65733f06923c9b9a774346cd9e004de8
BLAKE2b-256 9af7d1380892cdfc4e5cec82793fe9a695df9813f8d1ecb4f60d941ed06cb359

See more details on using hashes here.

File details

Details for the file mutamr-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: mutamr-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 13.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for mutamr-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 680e3fb6513de8b6d28861550b0995e0263e1641ac85ce3d8ecf5b478ff67b21
MD5 438ace95515f9fff39a8c96450a24594
BLAKE2b-256 b3ff3f7392fe07ea37587df2a7e9646034d29bf77a18afd471bce4db60abad74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page