Skip to main content

Identify real loops from Hi-C data.

Project description


hicpeaks provide a Python CPU-based implementation for BH-FDR and HICCUPS, two peak calling algorithms for Hi-C data, proposed by Rao et al [1].


hicpeaks is developed and tested on UNIX-like operating system, and following packages or softwares are required:

Python requirements:

  1. Python (2.7, not compatible with 3.x for now)
  2. Multiprocess
  3. Numpy
  4. Scipy
  5. Matplotlib
  6. Pandas
  7. Statsmodels
  8. Scikit-Learn
  9. H5py
  10. Cooler

Other requirements:

  • ucsc-fetchchromsizes

conda, an excellent package manager, can be used to install all requirements above.

Install Conda


If you have the Anaconda Distribution installed, you already have it.

Download the latest Linux Miniconda installer for Python 2.7, then in your terminal window type the following and follow the prompts on the installer screens:

$ bash

After that, update the environment variables to finish the Conda installation:

$ source ~/.bashrc

Install Packages through Conda

Conda allows separation of packages into separate repositories, or channels. The main defaults channel has a large amount of common packages including numpy, scipy, pandas, statsmodels, scikit-learn, and h5py listed above. To install these packages, type and execute the following command:

$ conda install numpy scipy matplotlib pandas statsmodels scikit-learn h5py

Other packages: cooler and ucsc-fetchchromsizes are not available in the defaults channel but included in the bioconda channel, and multiprocess is included in the conda-forge channel. To make them accessible, you need to add the bioconda channel as well as the other channels bioconda depends on (note that the order is important to guarantee the correct priority):

$ conda config --add channels conda-forge
$ conda config --add channels defaults
$ conda config --add channels r
$ conda config --add channels bioconda

To install these requirements:

$ conda install multiprocess cooler ucsc-fetchchromsizes

Install hicpeaks

Now just download the hicpeaks source code from PyPI, extract it and run the script:

$ python install

hicpeaks would be installed successfully if no exception occurs in the above process.


hicpeaks comes with 4 scripts: toCooler, pyBHFDR, pyHICCUPS and peak-plot.

  • toCooler

    Store TXT/NPZ bin-level Hi-C data into cooler container.

    1. I have included a sample data with hicpeaks source code to illustrate how you should prepare your data in TXT format. It’s quite easy, just remember 3 points: 1. the file name should follow this pattern “chrom1_chrom2.txt” (remove prefix from your chromosome labels, i.e. “chr1” should be “1”, and “chrX” should be “X”); 2. each file should only contain 3 columns, corresponding to “bin1” of “chrom1”, “bin2” of “chrom2”, and “contact frequency” (don’t perform any normalization processes); 3. all files at the same resolution should be placed under a single folder.
    2. NPZ format is another bin-level Hi-C data container which can extremely speed up data loading. hicpeaks supports NPZ files generated by runHiC and TADLib.
  • pyBHFDR

    A CPU-based python implementation for BH-FDR algorithm. Rao et al states in their supplementary material that this algorithm is robust enough to obtain all main results of their paper. Compared with HICCUPS, BH-FDR doesn’t use λ-chunk in multiple hypothesis test, and only considers the background Donut region when calculating the expected values. Here, pyBHFDR follows the algorithm pipelines of [1] faithfully except that it doesn’t implement the greedy clustering algorithm for original peak pixels.


    A CPU-based python implementation for HICCUPS algorithm. Besides the donut region, HICCUPS also considers the lower-left, vertical and horizontal backgrounds when calculating the expected values. And λ-chunk is used to overcome several multiple hypothesis testing challenges for Hi-C data. Finally, while BH-FDR has to limit the detected pixels near the diagonal (<2Mb), HICCUPS is able to generalize itself to any genomic distance in theory. Here, pyHICCUPS keeps all main concepts of the original algorithm except for these points which may be fixed in the near future:

    1. pyHICCUPS doesn’t implement additional filtering of peak pixels based on local enrichment thresholds.
    2. pyHICCUPS doesn’t cluster original nearby peak pixels into a single peak call.
    3. I haven’t implemented the function to combine peak annotations at different resolutions.
    4. Due to computational complexity, you should still limit the genomic distance of 2 loci to some degree (5Mb/10Mb).

    Although these differences, peaks returned by pyHICCUPS are quite consistent with our visual inspection, and generally follow the typical loop interaction patterns.

  • peak-plot

    Visualize peaks (or loops) detected by pyBHFDR or pyHICCUPS on heatmap. Just provide a cooler file and a loop annotation file, and input your interested region (chrom, start, end), peak-plot will export the figure in PNG format.


This tutorial will guide you through the basic usage of all scripts distributed with hicpeaks.


If you have already created a cooler file for your Hi-C data, skip to the next section pyBHFDR and pyHICCUPS, go on otherwise.

First, you should store your TXT/NPZ bin-level Hi-C data into a cooler file by using toCooler. Let’s begin with our sample data below. Suppose you are still in the hicpeaks distribution root folder: change your current working directory to the sub-folder example:

$ cd example
$ ls -lh *

-rw-r--r--  1 xtwang  staff    18B Aug 21 19:46 datasets
-rw-r--r--  1 xtwang  staff   293B Aug 23 20:53 hg38.chromsizes

total 11608
-rw-r--r--  1 xtwang  staff   2.7M Aug 21 19:44 21_21.txt
-rw-r--r--  1 xtwang  staff   2.9M Aug 21 19:44 22_22.txt

There are one sub-directory called 40K which contains Hi-C data of two chromosomes in K562 cell line at 40K resolution, and one metadata file datasets which we can pass directly to toCooler:

$ cd 40K
$ head -5 21_21.txt

250 251     1
250 258     1
250 259     1
250 260     4
250 261     2

$ cd ..
$ cat datasets


You should construct your TXT files (no head, no tail) with 3 columns, which indicate “bin1 of the 1st chromosome”, “bin2 of the 2nd chromosome” and “contact frequency” respectively. See Overview above.

To transform this data to cooler format, just run the command below:

$ toCooler -O -d datasets --assembly hg38 --nproc 2

toCooler routinely fetch sizes of each chromosome from UCSC with the provided genome assembly name (here hg38). However, if your reference genome is not holded in UCSC, you can also build a file like “hg38.chromsizes” in current working directory, and pass the file path to the argument “–chromsizes-file”.

Type toCooler with no arguments on your terminal to print detailed help information for each parameter.

For this datasets, toCooler will create a cooler file named “”, and your data will be stored under the URI “”.

This tutorial only illustrates a very simple case, in fact the metadata file may contain list of resolutions (if you have data at different resolutions in the same cell line) and corresponding folder paths (both relative and absolute path are accepted, and if your data are NPZ format, this path should point to the NPZ file):




Then toCooler will generate a single cooler file storing all the specified data under different cooler URI: “specified_cooler_path::10000”, “specified_cooler_path::20000” and “specified_cooler_path::40000”.


With cooler URI, you can perform peak annotation by pyBHFDR or pyHICCUPS:

$ pyBHFDR -O K562-MboI-BHFDR-loops.txt -p -C 21 22 --pw 1 --ww 3


$ pyHICCUPS -O K562-MboI-HICCUPS-loops.txt -p --pw 1 --ww 3

Type pyBHFDR or pyHICCUPS on your terminal to print detailed help information for each parameter.

Before step to the next section, let’s list the contents under current working directory again:

$ ls -lh

total 2360
drwxr-xr-x  5 xtwang  staff   160B Aug 25 23:18 40K
-rw-r--r--  1 xtwang  staff   3.4K Aug 25 23:19 BHFDR.log
-rw-r--r--  1 xtwang  staff   7.3K Aug 25 23:20 HICCUPS.log
-rw-r--r--  1 xtwang  staff   268K Aug 25 23:19 K562-MboI-BHFDR-loops.txt
-rw-r--r--  1 xtwang  staff    38K Aug 25 23:20 K562-MboI-HICCUPS-loops.txt
-rw-r--r--  1 xtwang  staff   704K Aug 25 23:19
-rw-r--r--  1 xtwang  staff    18B Aug 25 23:18 datasets
-rw-r--r--  1 xtwang  staff   293B Aug 25 23:18 hg38.chromsizes
-rw-r--r--  1 xtwang  staff    29K Aug 25 23:19 tocooler.log

Peak Visualization

Now, you can visualize BH-FDR and HICCUPS peak annotations on heatmap with peak-plot.

For BH-FDR peaks:

$ peak-plot -O test-BHFDR.png --dpi 250 -p -I K562-MboI-BHFDR-loops.txt -C 21 -S 40000000 -E 43000000 --correct --siglevel 0.0001

The output figure should look like this:


For HICCUPS peaks:

$ peak-plot -O test-HICCUPS.png --dpi 250 -p -I K562-MboI-HICCUPS-loops.txt -C 21 -S 40000000 -E 43000000 --correct --siglevel 0.1

And the output plot:



Although hicpeaks currently cannot perform further filtering based on local enrichment thresholds, you can do it by yourself with output annotations of pyBHFDR and pyHICCUPS.


The tables below show the performance test of toCooler, pyBHFDR and pyHICCUPS with low (T47D) and high (K562) sequencing data, at low (40K) and high (10K) resolutions.

  • Processor: 2.6 GHz Intel Core i7, Memory: 16 GB 2400 MHz DDR4
  • Software version: hicpeaks 0.2.0-r1
  • The original Hi-C data is stored in TXT
  • Number of proccesses assigned: 1
  • Valid contacts: total number of non-zero pixels on intra-chromosomal matrices
  • Running time format: hr: min: sec
Datasets Valid contacts toCooler pyBHFDR pyHICCUPS
  Memory Usage Running time Memory Usage Running time Memory Usage Running time
T47D (40K) 25,216,875 <600M 0:07:55 <300M 0:01:16 <600M 0:14:22
K562 (40K) 49,088,465 <1.2G 0:21:37 <500M 0:01:23 <700M 0:07:20
K562 (10K) 139,884,876 <3.0G 1:00:07 <1.3G 0:05:33 <3.8G 0:36:49

For T47D (40K) and K562 (40K), the results are based on the whole datasets.

For K562 (10K), toCooler read and stored the whole datasets, but pyBHFDR and pyHICCUPS only performed loop calling on chromosome 1. If your computer has sufficient memory, both pyBHFDR and pyHICCUPS are able to complete on all chromosomes within 1 hour under multi-process mode (--nproc).

Release Notes

Version 0.2.0-r1 (08/26/2018)

  1. Speeded up the program by dynamically limiting donut width
  2. Added performance table in README.rst

Version 0.2.0 (08/25/2018)

  1. Added vertical and horizontal backgrounds
  2. Added additional filtering based on dbscan clusters and more stringent q value thresholds
  3. Fixed bugs in storing interchromosomal data

Version 0.1.1 (08/24/2018)

  1. Lower memory usage and more efficient calculation

Version 0.1.0 (08/22/2018)

  1. The first release.
  2. Added toCooler and peak-plot.
  3. Added multiple process support.

Pre-Release (05/04/2015)

  1. Implemented core algorithms of BH-FDR and HICCUPS


[1](1, 2) Rao SS, Huntley MH, Durand NC et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell, 2014, 159(7):1665-80.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for hicpeaks, version 0.2.0-r1
Filename, size File type Python version Upload date Hashes
Filename, size hicpeaks-0.2.0-r1.tar.gz (1.8 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page