Skip to main content

Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization

Project description

ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization

logo

What

ROCCO is an algorithm for efficient identification of "consensus peaks" in multiple HTS data samples (namely, ATAC-seq), where read densities are consistently enriched across samples or particularly strong enrichment is observed in a nontrivial subset of samples.

Example Behavior

In the image below, ROCCO is run on a set of ten heterogeneous ATAC-seq samples (lymphoblast) from independent donors (ENCODE).

  • ROCCO consensus peaks are shown in red, where all default parameters are used in the first track, and the parametric-sigmoid transform --use_parsig option is applied to generate the results in the second track.
  • MACS2 (pooled library) consensus peak regions are shown in blue.
  • ENCODE cCREs are included as a rough reference of potentially active regions, but note that these regions are not specific to the data samples used in this analysis, nor are they derived from the same cell type or assay.

logo

How

ROCCO models consensus peak calling as a constrained optimization problem with an upper-bound on the total proportion of the genome selected as open/accessible and a fragmentation penalty to promote spatially consistency in active regions and sparsity elsewhere.

Why

ROCCO offers several attractive features:

  1. Consideration of enrichment and spatial characteristics of open chromatin signals
  2. Scaling to large sample sizes with an asymptotic time complexity independent of sample size
  3. No required training data or a heuristically determined set of initial candidate peak regions
  4. No rigid thresholds on the minimum number/width of supporting samples/replicates
  5. Mathematically tractable model with worst-case bounds on runtime and performance

Paper/Citation

If using ROCCO in your research, please cite the original paper in Bioinformatics (DOI: btad725)

 Nolan H Hamilton, Terrence S Furey, ROCCO: a robust method for detection of open chromatin via convex optimization,
 Bioinformatics, Volume 39, Issue 12, December 2023

Documentation

For additional details, usage, etc. please see ROCCO's documentation: https://nolan-h-hamilton.github.io/ROCCO/

Note that using the module-level functions directly may allow for greater flexibility in applications than using the command-line interface, which is limited in scope.

Installation

PyPI (pip)

pip install rocco

Build from Source

You can also build from source directly if preferred.

  • Clone or download this repository

    git clone https://github.com/nolan-h-hamilton/ROCCO.git
    cd ROCCO
    python setup.py sdist bdist_wheel
    pip install -e .
    

ROCCO utilizes the popular bioinformatics software Samtools and bedtools. If not available already, these system dependencies can be installed with standard MacOS or Linux/Unix package managers, e.g., brew install samtools (Homebrew), sudo apt-get install samtools (APT).

Input/Output

  • Input: Samples' BAM alignments or BigWig tracks
  • Output: BED file of consensus peak regions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rocco-1.0.0rc1.tar.gz (843.6 kB view hashes)

Uploaded Source

Built Distribution

rocco-1.0.0rc1-py3-none-any.whl (29.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page