Skip to main content

Robust ATAC-seq Peak Calling for Many Samples via Convex Optimization

Project description

ROCCO: [R]obust [O]pen [C]hromatin Detection via [C]onvex [O]ptimization

logo

Underlying ROCCO is a constrained optimization problem that can be solved efficiently to predict consensus regions of open chromatin across multiple samples.

Features

  1. Consideration of enrichment and spatial characteristics of open chromatin signals to capture the full extent of peaks;
  2. Mathematically tractable model that permits performance and efficiency guarantees.
  3. Efficient for large numbers of samples with an asymptotic time complexity independent of sample size;
  4. No arbitrary thresholds on the minimum number of supporting samples/replicates;
  5. No required training data or a heuristically determined set of initial candidate peak regions;

Paper

If using ROCCO in your research, please cite the original paper in Bioinformatics.

Installation

pip install rocco

GitHub (Homepage)

https://github.com/nolan-h-hamilton/ROCCO/

Example Usage

ROCCO offers a command-line interface for convenience and also an API for greater programmatic flexibility.

CLI (Command-line Interface)

See rocco --help for a full list of argument descriptions. Wildcards and regular expressions can be used to specify subsets of input files, chromosomes to skip, etc.

Example 1

  • BAM input files
  • Default chromosome-specific parameters for hg38 (See code Rocco.HG38_PARAMS)
rocco --input_files sample1.bam sample2.bam sample3.bam --genome_file genome.sizes --chrom_param_file hg38

Example 2

  • BigWig input files (Specified with a wildcard)
  • Default chromosome-specific parameters for hg38 (See code Rocco.HG38_PARAMS)
rocco --input_files *.bw --genome_file genome.sizes --chrom_param_file hg38

This input format is useful if you have used, e.g., deepTools bamCoverage, for normalization, smoothing, read extension, etc. the samples' initial BAM alignments.

Example 3

  • BedGraph input files (Specified with a wildcard)
  • Default chromosome-specific parameters for hg38
rocco --input_files *.bg --genome_file genome.sizes --chrom_param_file hg38

Example 4

  • BAM input files
  • Default chromosome-specific parameters for hg38
  • Scale coverage tracks for each sample individually with --sample_weights
rocco --input_files sample1.bam sample2.bam sample3.bam \
      --genome_file genome.sizes --chrom_param_file hg38 \
      --sample_weights 1.50 1.0 1.0

Example 5

  • Use a custom chromosome parameter file (tests/test_hg38_param_file.csv)
rocco --input_files tests/data/sample1.bw tests/data/sample2.bw \
      tests/data/sample3.bw --genome_file tests/test_hg38.sizes \
      --chrom_param_file tests/test_hg38_param_file.csv

API (Application Programmer Interface)

Example 6

>>> import rocco
>>> bw_files = ['tests/data/sample1.bw', 'tests/data/sample2.bw', 'tests/data/sample3.bw']
>>> rocco_obj = rocco.Rocco(input_files=bw_files, genome_file='tests/test_hg38.sizes', chrom_param_file='tests/test_hg38_param_file.csv')
>>> rocco_obj.run() # genome-wide output stored in BED6 file

Documentation

ROCCO's complete documentation is available at https://nolan-h-hamilton.github.io/ROCCO/

Testing ROCCO

Run unit tests

cd tests
pytest -v -rPA -l -k "regular" test_rocco.py

Notes/Miscellaneous

  • If using BedGraph or BigWig input, ensure contiguous intervals within each chromosome (no gaps)

  • Users may consider tweaking the default chromosome-specific $b,\gamma,\tau$ parameters or filtering peaks by score with the --peak_score_filter argument.

  • Peak scores are computed as the average number of reads over the given peak region (w.r.t samples), divided by the length of the region, and then scaled to units of kilobases. A suitable peak score cutoff can be evaluated by viewing the output histogram of peak scores.

Version History

Previous releases can be found at https://github.com/nolan-h-hamilton/ROCCO/tags

Additional dependencies for optional features:

  • 'mosek': Commercial grade solver. Users can instantly obtain a free academic license or generous trial commericial license at https://www.mosek.com/products/academic-licenses/.
  • 'ortools': includes the first-order solver, PDLP.
  • 'pytest': allows local execution of the Tests workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rocco-0.8.2.tar.gz (107.3 kB view hashes)

Uploaded Source

Built Distribution

rocco-0.8.2-py3-none-any.whl (15.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page