Skip to main content

Loop-calling and peak-calling for sequencing-based interaction data, including related analysis utilities.

Project description

cLoops2: full stack analysis tool for enriched chromatin interaction data



Introduction

cLoops2 is an extension of our previous work, cLoops. From loop-calling based on assumption-free clustering to a full suite of analysis tools for 3D genomic interaction data, cLoops2 has been adapted specifically for data such as Hi-Trac/Trac-looping, for which interactions are enriched over the genome through experimental steps. cLoops2 still supports Hi-C -like data, of which the interaction signals are evenly distributed at enzyme cutting sites. The changes from cLoops to cLoops2 are designed to address challenges around aiming for higher resolutions with the next-generation of genome architecture mapping technologies.

cLoops2 is designed with respect reference to bedtools and Samtools for command-line style programming. If you have experience with them, you will find cLoops2 easy and efficient to use and combine commands, integrate as steps in your processing pipeline.

Please refer to our in-preparing Hi-Trac method manuscript or cLoops2 manuscript for what cLoops2 can do and show.

If you use cLoops2 in your research (the idea, the algorithm, the analysis scripts or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:

Preprint bioRxiv: Yaqiang Cao et al. "Full-stack analysis for enriched 3D genomic interaction data with cLoops2"



cLoops2 Main Functions

Run cLoops2 or cLoops2 -h can show the main functions of cLoops2 with short descriptions and examples.

An enhanced, accurate and flexible peak/domain/loop-calling and analysis tool 
for 3D genomic interaction data.

Use cLoops2 sub-command -h to see detail options and examples for sub-commands.
Available sub-commands are: 
    qc: quality control of BEDPE files before analysis.
    pre: preprocess input BEDPE files into cLoops2 data.
    update: update cLoops2 data files locations.
    combine: combine multiple cLooops2 data directories.
    dump: convert cLoops2 data files to others (BEDPE, HIC, washU, bedGraph and
          contact matrix)
    estEps: estimate eps using Gaussian mixture models or k-distance plot.
    estRes: estimate reasonable contact matrix resolution based on signal 
            enrichment.
    estDis: estimate significant interactions distance range.
    estSat: estimate sequencing saturation based on contact matrix.
    estSim: estimate similarities among samples based on contact matrix.
    filterPETs: filter PETs based on peaks, loops, singleton mode or knn mode. 
    samplePETs: sample PETs according to specific target size.
    callPeaks: call peaks for ChIP-seq, ATAC-seq, ChIC-seq and CUT&Tag or the 
               3D genomic data such as Trac-looping, Hi-Trac, HiChIP and more.
    callLoops: call loops for 3D genomic data.
    callDiffLoops: call differentially enriched loops for two datasets. 
    callDomains: call domains for 3D genomic data. 
    plot: plot the interaction matrix, genes, view point plot, 1D tracks, 
          peaks, loops and domains for a specific region. 
    montage: analysis of specific regions, producing Westworld Season 3 -like 
             Rehoboam plot. 
    agg: aggregated feature analysis and plots, features can be peaks, view 
         points, loops and domains.
    quant: quantify peaks, loops and domains.
    anaLoops: anotate loops for target genes.
    findTargets: find target genes of genomic regions through networks from 
                 anaLoops.

Examples:
    cLoops2 qc -f trac_rep1.bedpe.gz,trac_rep2.bedpe,trac_rep3.bedpe.gz \
               -o trac_stat -p 3
    cLoops2 pre -f ../test_GM12878_chr21_trac.bedpe -o trac
    cLoops2 update -d ./trac
    cLoops2 combine -ds ./trac1,./trac2,./trac3 -o trac_combined -keep 1
    cLoops2 dump -d ./trac -o trac -hic
    cLoops2 estEps -d trac -o trac_estEps_gmm -p 10 -method gmm
    cLoops2 estRes -d trac -o trac_estRes -p 10 -bs 25000,5000,1000,200
    cLoops2 estDis -d trac -o trac -plot -bs 1000 
    cLoops2 estSim -ds Trac1,Trac2 -o trac_sim -p 10 -bs 2000 -m pcc -plot
    cLoops2 filterPETs -d trac -peaks trac_peaks.bed -o trac_peaksFiltered -p 10
    cLoops2 samplePETs -d trac -o trac_sampled -t 5000000 -p 10
    cLoops2 callPeaks -d H3K4me3_ChIC -bgd IgG_ChIC -o H3K4me3_cLoops2 -eps 150 \
                      -minPts 10
    cLoops2 callLoops -d Trac -eps 200,500,1000 -minPts 3 -filter -o Trac -w -j \
                      -cut 2000
    cLoops2 callLoops -d HiC -eps 1000,5000,10000 -minPts 10,20,50,100 -w -j \
                      -trans -o HiC_trans 
    cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \
                          -td ./target -cd ./control -o target_diff
    cLoops2 callDomains -d trac -o trac -bs 10000 -ws 200000
    cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \
                 -end 34895000 -triu -1D -loop test_loops.txt -log \
                 -gtf hg38.gtf -bws ctcf.bw -beds enhancer.bed
    cLoops2 montage -f test/chr21-chr21.ixy -o test -bed test.bed
    cLoops2 agg -d trac -loops trac.loop -peaks trac_peaks.bed \
                -domains hic_domains.bed -bws CTCF.bw,ATAC.bw -p 20 -o trac 
    cLoops2 quant -d trac -peaks trac_peaks.bed -loops trac.loop \
                  -domains trac_domain.txt -p 20 -o trac
    cLoops2 anaLoops -loops test_loop.txt -gtf gene.gtf -net -o test
    cLoops2 findTargets -net test_ep_net.sif -tg test_targets.txt \
                        -bed GWAS.bed -o test 
    More usages and examples are shown when run with cLoops2 sub-command -h.


optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.

Bug reports are welcome and can be put as issue at github repo or sent to 
caoyaqiang0410@gmail.com or yaqiang.cao@nih.gov. Thank you.


cLoops2 citations



cLoops2 updates

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cLoops2-0.0.2.tar.gz (693.5 kB view hashes)

Uploaded Source

Built Distributions

cLoops2-0.0.2-py3.6.egg (349.3 kB view hashes)

Uploaded Source

cLoops2-0.0.2-py3-none-any.whl (173.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page