Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization
Project description
ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization
Underlying ROCCO is a constrained optimization problem that can be solved efficiently to predict consensus regions of open chromatin across multiple samples.
Features
- Consideration of enrichment and spatial characteristics of open chromatin signals to capture the full extent of peaks;
- Mathematically tractable model that permits performance and efficiency guarantees.
- Efficient for large numbers of samples with an asymptotic time complexity independent of sample size;
- No arbitrary thresholds on the minimum number of supporting samples/replicates;
- No required training data or a heuristically determined set of initial candidate peak regions;
Paper
If using ROCCO in your research, please cite the original paper in Bioinformatics.
Installation
pip install rocco
GitHub (Homepage)
https://github.com/nolan-h-hamilton/ROCCO/
Example Usage
ROCCO offers a command-line interface for convenience and also an API for greater programmatic flexibility.
CLI (Command-line Interface)
See rocco --help
for a full list of argument descriptions. Wildcards and regular expressions can be used to specify subsets of input files, chromosomes to skip, etc.
Example 1
- BAM input files
- Default chromosome-specific parameters for hg38 (See code
Rocco.HG38_PARAMS
)
rocco --input_files sample1.bam sample2.bam sample3.bam --genome_file genome.sizes --chrom_param_file hg38
Example 2
- BigWig input files (Specified with a wildcard)
- Default chromosome-specific parameters for hg38 (See code
Rocco.HG38_PARAMS
)
rocco --input_files *.bw --genome_file genome.sizes --chrom_param_file hg38
This input format is useful if you have used, e.g., deepTools bamCoverage, for normalization, smoothing, read extension, etc. the samples' initial BAM alignments.
Example 3
- BedGraph input files (Specified with a wildcard)
- Default chromosome-specific parameters for hg38
rocco --input_files *.bg --genome_file genome.sizes --chrom_param_file hg38
Example 4
- BAM input files
- Default chromosome-specific parameters for hg38
- Scale coverage tracks for each sample individually with
--sample_weights
rocco --input_files sample1.bam sample2.bam sample3.bam \
--genome_file genome.sizes --chrom_param_file hg38 \
--sample_weights 1.50 1.0 1.0
Example 5
- Use a custom chromosome parameter file (
tests/test_hg38_param_file.csv
)
rocco --input_files tests/data/sample1.bw tests/data/sample2.bw \
tests/data/sample3.bw --genome_file tests/test_hg38.sizes \
--chrom_param_file tests/test_hg38_param_file.csv
API (Application Programmer Interface)
Example 6
>>> import rocco
>>> bw_files = ['tests/data/sample1.bw', 'tests/data/sample2.bw', 'tests/data/sample3.bw']
>>> rocco_obj = rocco.Rocco(input_files=bw_files, genome_file='tests/test_hg38.sizes', chrom_param_file='tests/test_hg38_param_file.csv')
>>> rocco_obj.run() # genome-wide output stored in BED6 file
Documentation
ROCCO's complete documentation is available at https://nolan-h-hamilton.github.io/ROCCO/
Testing ROCCO
Run unit tests
cd tests
pytest -v -rPA -l -k "regular" test_rocco.py
Notes/Miscellaneous
-
If using BedGraph or BigWig input, ensure contiguous intervals within each chromosome (no gaps)
-
Users may consider tweaking the default chromosome-specific $b,\gamma,\tau$ parameters or filtering peaks by score with the
--peak_score_filter
argument. -
Peak scores are computed as the average number of reads over the given peak region (w.r.t samples), divided by the length of the region, and then scaled to units of kilobases. A suitable peak score cutoff can be evaluated by viewing the output histogram of peak scores.
Version History
Previous releases can be found at https://github.com/nolan-h-hamilton/ROCCO/tags
Additional dependencies for optional features:
- 'mosek': Commercial grade solver. Users can instantly obtain a free academic license or generous trial commericial license at https://www.mosek.com/products/academic-licenses/.
- 'ortools': includes the first-order solver, PDLP.
- 'pytest': allows local execution of the Tests workflow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.