Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset with two or three replicates.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- POSIX
Programming Language
- Python :: 3

Project description

eCLIP-Peak

Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset with two or three replicates.

Installation

For Van Nostrand Lab

The pipeline has already been installed. Activate its environment by issue the following command: source /storage/vannostrand/software/eclip/venv/environment.sh.
For all others:
- Install Python (3.6+)
- Install peak (pip install eclip-peak)
- Install IDR (2.0.3+)
- Install Perl (5.10.1+) with the following packages:
  - Statistics::Basic (cpanm install Statistics::Basic)
  - Statistics::Distributions (cpanm install Statistics::Distributions)
  - install Statistics::R (cpanm install Statistics::R)

Usage

For Van Nostrand Lab

After activate peak's environment call peak -h to see the detailed usage.
For all others:

After successfully installed Python, peak, Perl (with required packages), call peak -h inside your terminal to see the following detailed usage:

$ peak -h
usage: peak [-h] 
            [--ip_bams IP_BAMS [IP_BAMS ...]] 
            [--input_bams INPUT_BAMS [INPUT_BAMS ...]] 
            [--peak_beds PEAK_BEDS [PEAK_BEDS ...]] 
            [--read_type READ_TYPE] [--outdir OUTDIR] 
            [--species SPECIES] 
            [--l2fc L2FC] [--l10p L10P] [--idr IDR] 
            [--dry_run] [--cores] [--debug]

Pipeline for using IDR to identify a set of reproducible peaks given eClIP dataset 
with two or three replicates.

optional arguments:
  -h, --help            show this help message and exit
  --ip_bams IP_BAMS [IP_BAMS ...]
                        Space separated IP bam files (at least 2 files).
  --input_bams INPUT_BAMS [INPUT_BAMS ...]
                        Space separated INPUT bam files (at least 2 files).
  --peak_beds PEAK_BEDS [PEAK_BEDS ...]
                        Space separated peak bed files (at least 2 files).
  --ids IDS [IDS ...]   Optional space separated short IDs (e.g., S1, S2, S3) for datasets.
  --read_type READ_TYPE
                        Read type of eCLIP experiment, either SE or PE.
  --outdir OUTDIR       Path to output directory.
  --species SPECIES     Short code for species, e.g., hg19, mm10.
  --l2fc L2FC           Only consider peaks at or above this l2fc cutoff, default: 3.
  --l10p L10P           Only consider peaks at or above this l10p cutoff, default: 3.
  --idr IDR             Only consider peaks at or above this idr score cutoff, default: 0.01.
  --cores CORES         Maximum number of CPU cores for parallel processing, default: 1.
  --dry_run             Print out steps and inputs/outputs of each step without 
                        actually running the pipeline.
  --debug               Invoke debug mode (only for develop purpose).

Outline of workflow

Normalize CLIP IP BAM over INPUT for each replicate
Peak compression/merging on input-normalized peaks for each replicate
Entropy calculation on IP and INPUT read probabilities within each peak for each replicate
Run IDR on peaks ranked by entropy
Normalize IP BAM over INPUT using new IDR peak regions
Identify reproducible peaks within IDR regions

Examples

eCLIP with 2 replicates

Assuming we have eCLIP pipeline run successfully and have the following files generated for species hg19:

replicate 1:
    IP BAM: ip1.bam
    INPUT BAM: input1.bam
    Peak BED: clip1.peak.clusters.bed
replicate 2:
    IP BAM: ip2.bam
    INPUT BAM: input2.bam
    Peak BED: clip2.peak.clusters.bed

The pipeline then can be called like this to identify reproducible peaks:

peak \
    --ip_bams ip1.bam ip2.bam \
    --input_bams input1.bam input2.bam \
    --peak_beds clip1.peak.clusters.bed clip2.peak.clusters.bed \
    --species hg19

eCLIP with 3 replicates

Assuming we have eCLIP pipeline run successfully and have the following files generated for species hg19:

replicate 1:
    IP BAM: ip1.bam
    INPUT BAM: input1.bam
    Peak BED: clip1.peak.clusters.bed
replicate 2:
    IP BAM: ip2.bam
    INPUT BAM: input2.bam
    Peak BED: clip2.peak.clusters.bed
replicate 3:
    IP BAM: ip3.bam
    INPUT BAM: input3.bam
    Peak BED: clip3.peak.clusters.bed

The pipeline then can be called like this to identify reproducible peaks:

peak \
    --ip_bams ip1.bam ip2.bam ip3.bam \
    --input_bams input1.bam input2.bam input3.bam \
    --peak_beds clip1.peak.clusters.bed clip2.peak.clusters.bed clip3.peak.clusters.bed \
    --species hg19

Note:

The indentation of the command does not matter, you can write it on the same line.
The order of bam and peak files followed by --ip_bams, input_bams, and peak_beds DOES matter, make sure you pass them in a consistent order for these three parameters.
There are 3 cutoffs can be set for fine tune the peak filtering, see Usage part for more details.
If the pipeline failed, check the log to identify the error and make necessary changes, re-run the pipeline will skip successfully processed parts only continue to processed failed and unprocessed parts.

Output

The peak pipeline will output 5 different types of files into the current work directory or into a user specified output directory (via --outdir):

*.bed: either a 6 columns or 9 columns bed file saves information for peaks.
*.tsv: TSV separated text file saves more information in addition to the BED file.
*.txt: text file saves the mapped reads count
*.out: TAB separated text file generated by IDR.
*.png: plot generated by IDR.

All filenames of output files are self-explained, only the basename of peak bed files ( after the removal of .peak.clusters.bed) was used to mark the name of each replicate.

The reproducible peaks can be found in *.reproducible.peaks.bed and additional information can be found in *.reproducible.peaks.custom.tsv. While the former file is 6-column bed file, the later one is a TSV separated text file with the following columns in order:

IDR region (entire IDR identified reproducible region)
Peak (reproducible peak region)
Geomean of the l2fc
Columns of log2 fold change (2 or 3 columns for 2 or 3 replicates experiment, respectively)
Columns of -log10 p-value (2 or 3 columns for 2 or 3 replicates experiment, respectively)

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- POSIX
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.20

Oct 18, 2021

1.0.19

Oct 14, 2021

1.0.18

Sep 16, 2021

1.0.17

Sep 16, 2021

1.0.16

Sep 16, 2021

1.0.15

Sep 14, 2021

1.0.14

Sep 14, 2021

This version

1.0.12

Sep 14, 2021

1.0.11

Sep 13, 2021

1.0.10

Sep 13, 2021

1.0.9

Sep 10, 2021

1.0.7

Sep 10, 2021

1.0.6

Sep 10, 2021

1.0.5

Jul 6, 2021

1.0.4

Jul 2, 2021

1.0.3

Jun 30, 2021

1.0.1

Jun 29, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eclip-peak-1.0.12.tar.gz (23.3 kB view details)

Uploaded Sep 14, 2021 Source

Built Distribution

eclip_peak-1.0.12-py3-none-any.whl (55.1 kB view details)

Uploaded Sep 14, 2021 Python 3

File details

Details for the file eclip-peak-1.0.12.tar.gz.

File metadata

Download URL: eclip-peak-1.0.12.tar.gz
Upload date: Sep 14, 2021
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for eclip-peak-1.0.12.tar.gz
Algorithm	Hash digest
SHA256	`300c437a222648e17bae1d5fbf95d56b545d52abcf1db75a8d734d5fc859acab`
MD5	`bb8de6a8a321de0e982df3c67ed50105`
BLAKE2b-256	`995ade159f6b88bf8d60d5781fb043a917f652c602527f7f632dbf668efd9093`

See more details on using hashes here.

File details

Details for the file eclip_peak-1.0.12-py3-none-any.whl.

File metadata

Download URL: eclip_peak-1.0.12-py3-none-any.whl
Upload date: Sep 14, 2021
Size: 55.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for eclip_peak-1.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db73338bb8b9b5c1d6421757e6fc0ec6ea55e9a32517fbde5e161ed15cdb0c2a`
MD5	`70fe5bfa6a59be959b224c8a0d3f323e`
BLAKE2b-256	`189d6f5ef8c2506f37ff5c03f375995eb2565e7a73b425f4d336825c60caf2a4`

See more details on using hashes here.

eclip-peak 1.0.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

eCLIP-Peak

Installation

Usage

Outline of workflow

Examples

Output

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes