Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization
Project description
ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization
What
ROCCO is an efficient algorithm for detection of "consensus peaks" in large datasets with multiple HTS data samples (namely, ATAC-seq), where an enrichment in read counts/densities is observed in a nontrivial subset of samples.
Input/Output
-
Input: Samples' BAM alignments or BigWig tracks
-
Output: BED file of consensus peak regions (Default format is BED3:
chrom,start,end) -
Note, if BigWig input is used, no preprocessing options can be applied at the alignment level and narrowPeak output cannot be generated.
How
ROCCO models consensus peak calling as a constrained optimization problem with an upper-bound on the total proportion of the genome selected as open/accessible and a fragmentation penalty to promote spatial consistency in active regions and sparsity elsewhere.
Why
ROCCO offers several attractive features:
- Consideration of enrichment and spatial characteristics of open chromatin signals
- Scaling to large sample sizes (100+) with an asymptotic time complexity independent of sample size
- No required training data or a heuristically determined set of initial candidate peak regions
- No rigid thresholds on the minimum number/width of supporting samples/replicates
- Mathematically tractable model permitting worst-case analysis of runtime and performance
Example Behavior
Input
- ENCODE lymphoblastoid data (BEST5, WORST5): 10 real ATAC-seq alignments of varying TSS enrichment (SNR-like quality measure for ATAC-seq)
- Synthetic noisy data (NOISY5)
We run twice under two conditions -- with noisy samples and without for comparison (blue)
rocco -i *.BEST5.bam *.WORST5.bam -g hg38 -o rocco_output_without_noise.bed
rocco -i *.BEST5.bam *.WORST5.bam *.NOISY5.bam -g hg38 -o rocco_output_with_noise.bed
Output
Comparing each output file:
- ROCCO is unaffected by the Noisy5 samples and effectively identifies true signal across multiple samples
- ROCCO simultaneously detects both wide and narrow consensus peaks
Paper/Citation
If using ROCCO in your research, please cite the original paper in Bioinformatics (DOI: btad725)
Nolan H Hamilton, Terrence S Furey, ROCCO: a robust method for detection of open chromatin via convex optimization,
Bioinformatics, Volume 39, Issue 12, December 2023
Documentation
For additional details, usage examples, etc. please see ROCCO's documentation: https://nolan-h-hamilton.github.io/ROCCO/
Installation
PyPI (pip)
python -m pip install rocco --upgrade
If lacking administrative control, you may need to append --user to the above.
Build from Source
If preferred, ROCCO can easily be built from source:
-
Clone or download this repository
git clone https://github.com/nolan-h-hamilton/ROCCO.git cd ROCCO python setup.py sdist bdist_wheel pip install -e .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rocco-1.6.0.tar.gz.
File metadata
- Download URL: rocco-1.6.0.tar.gz
- Upload date:
- Size: 707.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afc28cdc1ee769721ba57f48524449978e5ac177c901de125db834173fbe37c5
|
|
| MD5 |
dc2fe55bc9e130de0801ed4da35dd1a0
|
|
| BLAKE2b-256 |
fe92a96354f8eda30c96ac9ee882301f651cc51051067651fa1fc30e8d96b0af
|
File details
Details for the file rocco-1.6.0-py3-none-any.whl.
File metadata
- Download URL: rocco-1.6.0-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdafd4eaf35a6eeffc49a76c2b94ea9abe13cc4e7824c844614d7ac4972ac89a
|
|
| MD5 |
b437b0c6b066c23135b72c4d4c53d010
|
|
| BLAKE2b-256 |
ec2df835917541a27cc9e07261c71674f720a43f5b9f48e15610d4f11c47a586
|