Skip to main content

A tool for combining bed regions from multiple bed files in a probabilistically prinipled manner.

Project description

A tool for combining bed regions from multiple bed files in a probabilistically-prinipled manner.

Installation

It is recommended to install mumerge using a virtual environment or package manager—e.g. venv or conda. Specifically, because bedtools must be available at the command line we recommend you create a new environment with conda and install bedtools from bioconda, as follows:

(base) $ conda create -n mumerge_env
(base) $ conda activate mumerge_env
(mumerge_env) $ conda install -c bioconda bedtools

To confirm installation, check the bedtools version:

(mumerge_env) $ bedtools --version
bedtools v2.30.0

Now, with bedtools available within your environment Via pip ———– The simplest way of installing mumerge within your virtual environment is using pip. Be sure to use the appropriate version of Python if you have multiple versions installed. mumerge can then be installed with one of the following commands.

From PyPI:

$ python -m pip install mumerge

From GitHub:

$ python -m pip install git+https://github.com/Dowell-Lab/mumerge

If successful, mumerge should now be callable from the command line.

In order to upgrade to the latest version of mumerge from a previous one, include --upgrade in other of the previous pip commands.

Via git clone

Alternatively, you can download mumerge and all supporting files by cloning the GitHub repository to your local machine using git:

$ git clone https://github.com/Dowell-Lab/mumerge.git

If you clone the repo, you may want to add directory mumerge/mumerge to your system PATH variable (this will depend on your platform/OS) so that you can run mumerge directly from the command-line.

Dependencies

NumPy will be installed automatically when using pip to install mumerge. However, bedtools must be installed manually and made available in your system path prior to running mumerge.

Bedtools

muMerge relies on bedtools in order to group together those bed regions from the input bed files that will be combined by muMerge probabilistically. This grouping is done using the bedtools merge command. A bedtools binary is included as a part of the package, located at /mumerge/bin/bedtools.

Running demo

To demonstrate the functionality of muMerge a simple example including bedfiles and an input file are included in the package.

Usage

For general usage, used the help command:

$ mumerge -h

This will return the general commands needed to run muMerge:

usage: mumerge.py [-h] [-H] [-i INPUT] [-o OUTPUT] [-w WIDTH] [-m MERGED] [-r] [-v]

Merges region calls (mu) generated by Tfit, or other peak calling functions across
multiple samples and replicates.

optional arguments:
  -h, --help            show this help message and exit
  -H, --HELP            Verbose help info about the input format.
  -i INPUT, --input INPUT
                        Input file (full path) containing bedfiles, sample ID's and
                        replicate grouping names (tab delimited). Each sample on separate
                        line. First line header, equal to '#file<TAB>sampid<TAB>group',
                        required. 'file' must be full path. 'sampid' can be any string.
                        'group' can be string or integer. See '-H' help flag for more
                        information.
  -o OUTPUT, --output OUTPUT
                        Output file basename (full path, sans extension). WARNING:
                        will overwrite any existing file)
  -w WIDTH, --width WIDTH
                        The ratio of a the sigma for the corresponding probabilty
                        distribution to the bed region (half-width) --- sigma:half-bed
                        (default: 1). The choice for this parameter will depend on the
                        data type as well as how bed regions were inferred from the
                        expression data.
  -m MERGED, --merged MERGED
                        Sorted bedfile (full path) containing the regions over which
                        to combine the sample bedfiles. If not specified, mumerge will
                        generate one directly from the sample bedfiles.
  -r, --remove_singletons
                        Remove calls not present in more than 1 sample
  -v, --verbose         Verbose printing during processing.

Input file

The <INPUT> file is a tab delimited text file that contains paths to BED files to be merged along with sample names as condition/replicate information for each sample. In the example below, there are 4 samples with two treatment groups.

#file   sampid  group
/path/to/sample1.bed    sample1 control
/path/to/sample2.bed    sample2 control
/path/to/sample3.bed    sample3 treatment
/path/to/sample4.bed    sample4 treatment

You can find this information using the -H flag—i.e. running mumerge -H, which will return the following:

Input file containing bedfiles, sample ID's, and replicate groupings. Input
file (indicated by the '-i' flag) should be of the following (tab delimited)
format:

#file   sampid  group
/full/file/path/filename1.bed   sampid1 A
/full/file/path/filename2.bed   sampid2 B
...

Header line indicated by '#' character must be included and fields must
follow the same order as non-header lines. The order of subsequent lines does
matter. 'group' identifiers should group files that are technical/biological
replicates. Different experimental conditions should recieve different 'group'
identifiers. The 'group' identifier can be of type 'int' or 'str'. If 'sampid'
is not specified, then default sample ID's will be used.

Output files

muMerge returns the merged regions in BED file format (project_id_MUMERGE.bed). Additionally, a log file (project_id.log) that details the summary of the run is also inlcuded along with intermediate files (project_id_MISCALLS.bed and project_id_BEDTOOLS_MERGE.bed).

Runtime

The overall run time depends on the the number for input BED files and regions being merged. A test case, where 8 samples (~30,000 regions) with 6 condition groups were merged, took about 12 minutes on a MacBook Pro iCore i9 2.3 GHz running macOS v 10.14.6.

Cite

Please cite the following article if you use muMerge: Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment <https://doi.org/10.1038/s42003-021-02153-7>

BibTeX citation:

@article{rubin2021transcription,
  title={Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment},
  author={Rubin, Jonathan D and Stanley, Jacob T and Sigauke, Rutendo F and Levandowski, Cecilia B and Maas, Zachary L and Westfall, Jessica and Taatjes, Dylan J and Dowell, Robin D},
  journal={Communications biology},
  volume={4},
  number={1},
  pages={1--15},
  year={2021},
  publisher={Nature Publishing Group}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mumerge-0.0.5.tar.gz (13.9 MB view details)

Uploaded Source

Built Distribution

mumerge-0.0.5-py3-none-any.whl (13.9 MB view details)

Uploaded Python 3

File details

Details for the file mumerge-0.0.5.tar.gz.

File metadata

  • Download URL: mumerge-0.0.5.tar.gz
  • Upload date:
  • Size: 13.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for mumerge-0.0.5.tar.gz
Algorithm Hash digest
SHA256 653f0e677b1ccbcd9cf67ed33b9f197fe7df7cb476cbf35d07bde70cb77d256b
MD5 37cbf42c26f8409f7d397a5aacba4a2b
BLAKE2b-256 48d52773b648de74583afbb2a2a5c5a594e35be4aabea733713a30d4181d7f8a

See more details on using hashes here.

File details

Details for the file mumerge-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: mumerge-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 13.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for mumerge-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2b48d718f281c28cf239b79d3aec00bad935031b9abe614c608be3aabb6fdf98
MD5 66735b2a323b5f20c3aae04d9b7b1402
BLAKE2b-256 3489f3e065a9548c0fcfe8396f898d2237456371c7dc6ce0933b5a5f1d8d67a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page