rmseq

RM-seq is a bioinformatics tool for for assessing resistance mutations from PE short-reads.

These details have not been verified by PyPI

Project links

Download

Project description

Analysis bioinformatic pipeline for high-throughput identification and quantification of large repertoires of resistance conferring mutations. RM-seq is an amplicon-based, deep-sequencing technique using single molecule barcoding. We have adapted this method in order to identify and quantify mutations that confer resistance to a given antibiotic.

RM-seq allows to both correct sequenced read errors generated during sequencing and to accurately quantify mutations by correcting PCR amplification bias generated during sequencing library preparation. During the first step of amplicon library preparation, a linear PCR (primer extension) with a primer comprising a tail with degenerated bases (all possible bases) introduce a unique barcode to DNA template molecules. Therefore a barcode is assigned not just to all the molecules from a certain sample (indexing), but to all molecules being amplified and sequenced. RM-seq pipeline use these barcodes to generate an error-corrected consensus sequence of the initial template variant. Counting the barcodes indroduced before exponential amplification by PCR of the template allows to accurately quantify each genetic variants from genomic DNA extracted from complex population of resistant clones (eg. pools of 10,000 resistant colonies selected by an antibiotic in vitro).

A complete descrition of the RM-seq method will be available soon (article submitted)

Is this the right tool for me?

To be able to us this pipeline you need to have sequenced amplicon library with molecular barcodes.
It only supports paired-end FASTQ reads (including .gz compressed fastq files).
It needs paired reads that are overlapping.
It needs bwa aligner and EMBOSS to be installed.
It needs a reference fasta sequences of the sequenced gene (DNA and protein sequence).
It’s written in Python and Perl.

Installation

Install RM-seq pipeline

pip3 install rmseq

If installing using pip3 install rmseq --user Create symlinks to the packaged data using

ln -s $HOME/.local/lib/python3.6/site-packages/RMseq/test_data/ $HOME/.local/bin/

Dependencies

RM-seq has the following package dependencies: * EMBOSS >= 6.6 for clustalo, cons, getorf, diffseq * clustal-omega >= 1.2.1 * bwa >= 0.7.15 * samtools >= 1.3 * pear >= 0.9.10 * cd-hit >= 4.7 * trimmomatic >= 0.36 * python modules: plumbum, Biopython

If you are using the OSX Brew or LinuxBrew packaging system:

brew tap homebrew/science
brew tap tseemann/bioinformatics-linux
brew install EMBOSS
brew install clustal-omega
brew install bwa
brew install samtools
brew install pear
brew install cd-hit
brew install trimmomatic
pip3 install plumbum
pip3 install biopython

Quick start

rmseq

Help

usage: rmseq [-h]  ...

Run RM-seq pipeline.

optional arguments:
  -h, --help  show this help message and exit

Commands:

    run       Run the pipeline.
    version   Print version.
    check     Check pipeline dependencies
    test      Run the test data set.

To check dependencies are installed

rmseq check

To run the test dataset

rmseq test

To run analysis pipeline, follow the steps in

rmseq run -h
usage: rmseq run [options]

Run the pipeline

positional arguments:
  R1                    Path to read pair 1
  R2                    Path to read pair 2
  refnuc                Reference gene that will be used for premapping
                        filtering (fasta).
  refprot               Reference protein that will be use for annotating
                        variants (fasta).
  outdir                Output directory.

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug_on        Switch on debug mode.
  -f, --force           Force overwite of existing.
  -b BARLEN, --barlen BARLEN
                        Length of barcode (default 16)
  -m MINFREQ, --minfreq MINFREQ
                        Minimum barcode frequency to keep (default 5)
  -c CPUS, --cpus CPUS  Number of CPUs to use (default 72)
  -r MINSIZE, --minsize MINSIZE
                        Minimum ORF size in bp used when annotating variants
                        (default 200)
  -w WSIZE, --wsize WSIZE
                        Word-size option to pass to diffseq for comparison
                        with reference sequence (default 5)
  -s SUBSAMPLE, --subsample SUBSAMPLE
                        Only examine this many reads.
  -k, --keepfiles       Keep the intermediate files. Default is to remove
                        intermediate files

To check the version

rmseq version

Outputs

RM-seq produces a tap-separated output file called amplicons.effect with the following columns:

Column	Example	Description
barcode	GACACAACTGAGATTA	The sequence of the barcode
sample	Rifampicin1	The output folder name
aa_mutation	H481N	The annotation of the amino acid change (Histidine residue 481 substituted by Asparagine)
start	481	start coordinate of the mutation
end	481	end coordinate of the mutation
orf	VRPPDKNNRFVGLYCTLV…	the protein sequence of the consensus sequence
dna	GGTTAGACCACCCGATAA…	The dna sequence of the consensus sequence

The other files produced by RM-seq are:

File name	Description
amplicons.nuc	Multifasta file containing all the consensus nucleotide sequence (header of sequence is the barcode)
amplicons.orf	Multifasta file containing all the consensus protein sequence (header of sequence is the barcode)
amplicons.barcodes	Table with the count of each barcode sequence
amplicons.cdhit	Multifasta file containing all the unique consensus nucleotide sequence (header of sequence is the barcode)

Issues

Please report problems to the Issues Page.

Author

Romain Guerillot | Torsten Seemann | Mark B Schultz (github: schultzm)

Project details

These details have not been verified by PyPI

Project links

Download

Release history Release notifications | RSS feed

0.0.57

Jan 16, 2019

0.0.56

Sep 18, 2018

0.0.53b0 pre-release

Sep 18, 2018

0.0.51b0 pre-release

Nov 30, 2017

0.0.47b0 pre-release

Oct 4, 2017

0.0.46b0 pre-release

Oct 4, 2017

0.0.45b0 pre-release

Oct 3, 2017

0.0.43b0 pre-release

Oct 3, 2017

0.0.42b0 pre-release

Oct 3, 2017

0.0.41b0 pre-release

Oct 3, 2017

0.0.28b0 pre-release

Oct 2, 2017

0.0.27b0 pre-release

Oct 2, 2017

This version

0.0.26b0 pre-release

Oct 2, 2017

0.0.25b0 pre-release

Oct 2, 2017

0.0.24b0 pre-release

Oct 2, 2017

0.0.23b0 pre-release

Oct 2, 2017

0.0.22b0 pre-release

Oct 2, 2017

0.0.21b0 pre-release

Sep 28, 2017

0.0.20b0 pre-release

Sep 28, 2017

0.0.19b0 pre-release

Sep 28, 2017

0.0.18b0 pre-release

Sep 28, 2017

0.0.17b0 pre-release

Sep 28, 2017

0.0.16b0 pre-release

Sep 28, 2017

0.0.15b0 pre-release

Sep 28, 2017

0.0.14b0 pre-release

Sep 28, 2017

0.0.13b0 pre-release

Sep 28, 2017

0.0.12b0 pre-release

Sep 28, 2017

0.0.10b0 pre-release

Sep 28, 2017

0.0.9b0 pre-release

Sep 28, 2017

0.0.8b0 pre-release

Sep 28, 2017

0.0.6b0 pre-release

Sep 28, 2017

0.0.5b0 pre-release

Sep 28, 2017

0.0.3b0 pre-release

Sep 28, 2017

0.0.2b0 pre-release

Sep 27, 2017

0.0.1b0 pre-release

Sep 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rmseq-0.0.26b0.tar.gz (3.9 MB view hashes)

Uploaded Oct 2, 2017 Source

Hashes for rmseq-0.0.26b0.tar.gz

Hashes for rmseq-0.0.26b0.tar.gz
Algorithm	Hash digest
SHA256	`8ee700dbd2022f70e04c1d53e6596604bb600823e696d54a58d88785eeee35df`
MD5	`3fd286d3cc26d37502adf716c6c93ac0`
BLAKE2b-256	`22bef81bba4b59defbc4cacd7684550cbb1a0c109773289af57d9259a386e194`