Skip to main content

Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization

Project description

ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization

logo

What

ROCCO is an efficient algorithm for detection of "consensus peaks" in large datasets with multiple HTS data samples (namely, ATAC-seq), where an enrichment in read counts/densities is observed in a nontrivial subset of samples.

Input/Output

  • Input: Samples' BAM alignments or BigWig tracks
  • Output: BED file of consensus peak regions

Note, if BigWig input is used, no preprocessing options can be applied at the alignment level.

How

ROCCO models consensus peak calling as a constrained optimization problem with an upper-bound on the total proportion of the genome selected as open/accessible and a fragmentation penalty to promote spatial consistency in active regions and sparsity elsewhere.

Why

ROCCO offers several attractive features:

  1. Consideration of enrichment and spatial characteristics of open chromatin signals
  2. Scaling to large sample sizes (100+) with an asymptotic time complexity independent of sample size
  3. No required training data or a heuristically determined set of initial candidate peak regions
  4. No rigid thresholds on the minimum number/width of supporting samples/replicates
  5. Mathematically tractable model permitting worst-case analysis of runtime and performance

Example Behavior

Input

  • ENCODE lymphoblastoid data (BEST5, WORST5): 10 real ATAC-seq alignments of varying TSS enrichment (SNR-like)
  • Synthetic noisy data (NOISY5)

We run twice under two conditions -- with noisy samples and without

rocco -i *.BEST5.bam *.WORST5.bam -g hg38 -o rocco_output_without_noise.bed
rocco -i *.BEST5.bam *.WORST5.bam *.NOISY5.bam -g hg38 -o rocco_output_with_noise.bed

Output

Comparing each output file:

  • ROCCO effectively separates true signal from noise across multiple samples
  • ROCCO is robust to noisy samples (e.g., output unaffected by inclusion of NOISY5 inputs)
  • ROCCO offers high resolution separation of enriched regions

example

Paper/Citation

If using ROCCO in your research, please cite the original paper in Bioinformatics (DOI: btad725)

 Nolan H Hamilton, Terrence S Furey, ROCCO: a robust method for detection of open chromatin via convex optimization,
 Bioinformatics, Volume 39, Issue 12, December 2023

Documentation

For additional details, usage examples, etc. please see ROCCO's documentation: https://nolan-h-hamilton.github.io/ROCCO/

Installation

PyPI (pip)

pip install rocco --upgrade

Build from Source

If preferred, ROCCO can easily be built from source:

  • Clone or download this repository

    git clone https://github.com/nolan-h-hamilton/ROCCO.git
    cd ROCCO
    python setup.py sdist bdist_wheel
    pip install -e .
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rocco-1.4.1.tar.gz (699.9 kB view details)

Uploaded Source

Built Distribution

rocco-1.4.1-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file rocco-1.4.1.tar.gz.

File metadata

  • Download URL: rocco-1.4.1.tar.gz
  • Upload date:
  • Size: 699.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for rocco-1.4.1.tar.gz
Algorithm Hash digest
SHA256 6e1b7c87de4a9d08c52ecc0785cee02bdb6f80a86f6d3e2aefa0bd4522fcb838
MD5 8cca28a0c1459ead9fa88e2ce63698d0
BLAKE2b-256 47c30be9d242e4edeb12d124ec8314bf1043a28b6a06c988fd52d4ccffa7f069

See more details on using hashes here.

File details

Details for the file rocco-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: rocco-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for rocco-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 76794cc2449ff2b9f07a118917e29e8e6de6ea95fa758b0b01d5ec66dc361e3f
MD5 4babd64bf32a15035424751bb3f3c2d1
BLAKE2b-256 29c23264ab503d13ed9564dc3e5ad0f33f2ef5542280885c8e90ef7c45ce22f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page