Skip to main content

A metagenomics pipeline to estimate relative cell periods.

Project description


This bundle of software is a basic implementation of the algorithm for
extracting Peak-to-Trough Ratios from Metagenomic data, as first
described in `(Korem et. al, Science,
2015) <>`__.



Make sure that "pip" is the PyPi command of your *python2* installation,

.. code:: bash

pip install menace


.. code:: bash

git clone
cd Menace
python install

This should install the below *python* dependencies. The other
dependencies have to be installed manually (if you have questions about
this I suggest you consult your cluster IT help desk).

The software has been tested on the "hebbe" cluster at
`C3SE <>`__ which uses the "slurm" system for resource
management (thus slurm is the only queueing system currently supported).



-e git+

`samtools <>`__

`bamtools <>`__

`bowtie2 <>`__

2.0 <>`__
(should be installed by the above pip command but make sure 'pathoscope
ID' is accessible in the shell, ie. is on the system path)

`parallel <>`__

`DoriC <>`__ is a databse of
chromosome origin locations (OriCs) which is a (recommended) optional
dependency for the pipeline. Please visit the link and enter your e-mail
to download.


You can get an overview of the menace functionality by running
``menace -h``.

1. Initialize a project in current directory by running ``menace init``.
Identify a set of NCBI genome reference accession numbers and put
them in "./searchStrings" (or use the default one which includes a
*minimal* set of references to bacteria common in the human gut).

2. Identify a metagenomic cohort of interest (download manually or add
URLs as described below) and add to the Data folder. Supported input:
raw/gzipped/bzipped ".fastq" files.

3. Add information to the ``project.conf`` file.

4. Edit ```` to include the **python2** module of the
cluster (or comment out the lines if python2 is accessible by

5. Run ``menace full`` (use "nohup {cmd} &" to keep alive after logout
if on a cluster login node).

6. Wait for job to complete. Run ``menace collect`` in project


The menace script is a common utility for all parts of the pipeline
including downloading of references and metagenomic data, bulding a
reference index, setting up the necessary file structure and submitting
to slurm. Hence, all configuration is intended to be set up in
project.conf (please see ``bin/project.conf.example`` for an example).

The default 'searchStrings' will most probably not fit your purposes but
is only an example. A more comprehensive Reference library will yield
higher coverage and more accurate values. A more comprehensive list of
human gut bacteria is available at 'extra/referenceACClong.txt'.

Directory structure (*example*)

With the above usage example the path structure(s) will look something
like below.


├ "Sample01" (eg. ERR525688)
. ├ {sample01_1.fastq.gz}
. └ {sample01_2.fastq.gz} paired metagenomic reads

├ Index
| └ {REF_NAME.*.bt2l} bowtie2 index files
├ Fasta
| └ {accession.fasta}
├ Headers
| └ {accession.xml} xml files containing extra genome references info
└ taxIDs.txt

├ bacteria_record.dat
└ bacteria_seq.fas

├ "Sample01"
. ├ depth
. | └ {accession.depth} coverage files for each reference
. ├ log
| └ {accession.log} output logs from piecewiseFit
├ npy
| └ {accession_OriC_TerC.npy} numpy files with origin/terminus locations and relative C periods
├ png
| └ {accession_fit.png} images of piecewise fit of the smoothed coverage
└ accession-sam-report.tsv Pathoscope2 reassignment report


Below follows a description of the main scripts in the package.


A submit script for sending a batch job to slurm for parallel processing
on a computing cluster.

**input:** none

**output:** directory structure as specified in "project.conf"

The main build script with commands intended to be executed on the

**input:** none

**output:** temporary paths and files on compute nodes

Traverses the specified directory generated by and
assembles information from each sample into tabular form (eg. averages
origin locations from many samples for a better estimate).

**input:** $OUTPUT\_PATH, $DORIC\_PATH, $REF\_PATH, bin/accLoc.csv

**output:** Abundance.csv, PTR.csv, DoublingTime.csv, Header.csv

Implements the piecewise linear fit and prior checks on the generated
depth files to filter out those instances in which enough data was
generated to produce a reliable coverage signal for estimating
replication origins. This data can be used further on, once those has
been estimated using the full cohort, to produce PTR-vaules for each

**input:** {reference.depth}

**output:** {reference\_OriC.npy}, {reference\_TerC.npy},
{reference\_coverage.png}, {reference\_fit.log}

This utility can be used to download '.fasta' reference files from the
NCBI servers.

**input:** searchStrings.txt,

**output:** {reference.fasta}, {reference.xml}, taxIDs.txt

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

menace-0.1.3.tar.gz (3.6 MB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page