Skip to main content

A metagenomics pipeline to estimate relative cell periods.

Project description

Menace
======

This bundle of software is a basic implementation of the algorithm for
extracting Peak-to-Trough Ratios from Metagenomic data, as first
described in `(Korem et. al, Science,
2015) <http://science.sciencemag.org/content/349/6252/1101>`__.

Installation:
-------------

Pip
~~~

Make sure that "pip" is the PyPi command of your *python2* installation,
then:

.. code:: bash

pip install menace

Git
^^^

.. code:: bash

git clone git@github.com:zertan/Menace.git
cd Menace
python setup.py install

This should install the below *python* dependencies. The other
dependencies have to be installed manually (if you have questions about
this I suggest you consult your cluster IT help desk).

The software has been tested on the "hebbe" cluster at
`C3SE <c3se.chalmers.se>`__ which uses the "slurm" system for resource
management (thus slurm is the only queueing system currently supported).

Dependencies:
~~~~~~~~~~~~~

::

Python2:
numpy
scipy
pandas
biopython
matplotlib
xmltodict
configparser
lmfit
newick
Jinja2
doric
-e git+https://github.com/PathoScope/PathoScope.git#egg=pathoscope

`samtools <http://www.htslib.org/download/>`__

`bamtools <https://github.com/pezmaster31/bamtools/wiki/Building-and-installing>`__

`bowtie2 <https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.9/>`__

`Pathoscope
2.0 <https://sourceforge.net/projects/pathoscope/files/?source=navbar>`__
(should be installed by the above pip command but make sure 'pathoscope
ID' is accessible in the shell, ie. is on the system path)

`parallel <http://www.gnu.org/software/parallel/>`__

`DoriC <http://tubic.tju.edu.cn/doric/download.php>`__ is a databse of
chromosome origin locations (OriCs) which is a (recommended) optional
dependency for the pipeline. Please visit the link and enter your e-mail
to download.

Usage
-----

You can get an overview of the menace functionality by running
``menace -h``.

1. Initialize a project in current directory by running ``menace init``.
Identify a set of NCBI genome reference accession numbers and put
them in "./searchStrings" (or use the default one which includes a
*minimal* set of references to bacteria common in the human gut).

2. Identify a metagenomic cohort of interest (download manually or add
URLs as described below) and add to the Data folder. Supported input:
raw/gzipped/bzipped ".fastq" files.

3. Add information to the ``project.conf`` file.

4. Edit ``loadmodules.sh`` to include the **python2** module of the
cluster (or comment out the lines if python2 is accessible by
default).

5. Run ``menace full`` (use "nohup {cmd} &" to keep alive after logout
if on a cluster login node).

6. Wait for job to complete. Run ``menace collect`` in project
directory.

Notes
^^^^^

The menace script is a common utility for all parts of the pipeline
including downloading of references and metagenomic data, bulding a
reference index, setting up the necessary file structure and submitting
to slurm. Hence, all configuration is intended to be set up in
project.conf (please see ``bin/project.conf.example`` for an example).

The default 'searchStrings' will most probably not fit your purposes but
is only an example. A more comprehensive Reference library will yield
higher coverage and more accurate values. A more comprehensive list of
human gut bacteria is available at 'extra/referenceACClong.txt'.

Directory structure (*example*)
-------------------------------

With the above usage example the path structure(s) will look something
like below.

::

$DATA_PATH
├ "Sample01" (eg. ERR525688)
. ├ {sample01_1.fastq.gz}
. └ {sample01_2.fastq.gz} paired metagenomic reads
.

$REF_PATH
├ Index
| └ {REF_NAME.*.bt2l} bowtie2 index files
├ Fasta
| └ {accession.fasta}
├ Headers
| └ {accession.xml} xml files containing extra genome references info
└ taxIDs.txt

$DORIC_PATH
├ bacteria_record.dat
└ bacteria_seq.fas

$OUTPUT_PATH
├ "Sample01"
. ├ depth
. | └ {accession.depth} coverage files for each reference
. ├ log
| └ {accession.log} output logs from piecewiseFit
├ npy
| └ {accession_OriC_TerC.npy} numpy files with origin/terminus locations and relative C periods
├ png
| └ {accession_fit.png} images of piecewise fit of the smoothed coverage
└ accession-sam-report.tsv Pathoscope2 reassignment report

Contents
--------

Below follows a description of the main scripts in the package.

jobscript
^^^^^^^^^

A submit script for sending a batch job to slurm for parallel processing
on a computing cluster.

**input:** none

**output:** directory structure as specified in "project.conf"

mainBuild.sh
^^^^^^^^^^^^

The main build script with commands intended to be executed on the
cluster.

**input:** none

**output:** temporary paths and files on compute nodes

PTRMatrix.py
^^^^^^^^^^^^

Traverses the specified directory generated by mainBuild.sh and
assembles information from each sample into tabular form (eg. averages
origin locations from many samples for a better estimate).

**input:** $OUTPUT\_PATH, $DORIC\_PATH, $REF\_PATH, bin/accLoc.csv

**output:** Abundance.csv, PTR.csv, DoublingTime.csv, Header.csv

piecewiseFit.py
^^^^^^^^^^^^^^^

Implements the piecewise linear fit and prior checks on the generated
depth files to filter out those instances in which enough data was
generated to produce a reliable coverage signal for estimating
replication origins. This data can be used further on, once those has
been estimated using the full cohort, to produce PTR-vaules for each
sample.

**input:** {reference.depth}

**output:** {reference\_OriC.npy}, {reference\_TerC.npy},
{reference\_coverage.png}, {reference\_fit.log}

fetchSeq.py
^^^^^^^^^^^

This utility can be used to download '.fasta' reference files from the
NCBI servers.

**input:** searchStrings.txt,

**output:** {reference.fasta}, {reference.xml}, taxIDs.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

menace-0.1.3.tar.gz (3.6 MB view details)

Uploaded Source

File details

Details for the file menace-0.1.3.tar.gz.

File metadata

  • Download URL: menace-0.1.3.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for menace-0.1.3.tar.gz
Algorithm Hash digest
SHA256 017fe228c33b849486dab7efc823580ca1fd0881d6473c10fb5586f39b5557a2
MD5 faa6987c2f73a1bfc9aa9d5a90359c63
BLAKE2b-256 ce779f89b9fb4931f157a91d38d0895e40c2bd97555d37694345c74fd5fbdd85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page