Sparse binary format for genomic interaction matrices
Cooler is a support library for a sparse, compressed, binary persistent storage format, called cool, used to store genomic interaction data, such as Hi-C contact matrices.
The cool file format is a reference implementation of a genomic matrix data model using HDF5 as the container format.
The cooler package aims to provide the following functionality:
To get started:
Install from PyPI using pip.
$ pip install cooler
See the docs for more information.
The cooler package includes command line tools for creating, querying and manipulating cool files.
$ cooler makebins $CHROMSIZES_FILE $BINSIZE > bins.10kb.bed $ cooler cload bins.10kb.bed $CONTACTS_FILE out.cool $ cooler balance -p 10 out.cool $ cooler dump -b -t pixels --header --join -r chr3:10,000,000-12,000,000 -r2 chr17 out.cool | head
chrom1 start1 end1 chrom2 start2 end2 count balanced chr3 10000000 10010000 chr17 0 10000 1 0.810766 chr3 10000000 10010000 chr17 520000 530000 1 1.2055 chr3 10000000 10010000 chr17 640000 650000 1 0.587372 chr3 10000000 10010000 chr17 900000 910000 1 1.02558 chr3 10000000 10010000 chr17 1030000 1040000 1 0.718195 chr3 10000000 10010000 chr17 1320000 1330000 1 0.803212 chr3 10000000 10010000 chr17 1500000 1510000 1 0.925146 chr3 10000000 10010000 chr17 1750000 1760000 1 0.950326 chr3 10000000 10010000 chr17 1800000 1810000 1 0.745982
The cooler library provides a thin wrapper over the excellent h5py Python interface to HDF5. It supports creation of cooler files and the following types of range queries on the data:
>>> import cooler >>> import matplotlib.pyplot as plt >>> c = cooler.Cooler('bigDataset.cool') >>> resolution = c.info['bin-size'] >>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000') >>> plt.matshow(np.log10(mat), cmap='YlOrRd')
>>> import multiprocessing as mp >>> import h5py >>> pool = mp.Pool(8) >>> f = h5py.File('bigDataset.cool', 'r') >>> weights, stats = cooler.ice.iterative_correction(f, map=pool.map, ignore_diags=3, min_nnz=10)
The cool format implements a simple data model that stores a genomic matrix in a sparse representation, crucial for developing robust tools for use on increasingly high resolution Hi-C data sets, including streaming and out-of-core algorithms.
The data tables in a cool file are stored in a columnar representation as HDF5 groups of 1D array datasets of equal length. The contact matrix itself is stored as a single table containing only the nonzero upper triangle pixels.
Pull requests are welcome. The current requirements for testing are nose and mock.
For development, clone and install in “editable” (i.e. development) mode with the -e option. This way you can also pull changes on the fly.
$ git clone https://github.com/mirnylab/cooler.git $ cd cooler $ pip install -e .
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|cooler-0.7.4-py2.py3-none-any.whl (78.0 kB) Copy SHA256 Checksum SHA256||3.5||Wheel||May 25, 2017|
|cooler-0.7.4.tar.gz (51.4 MB) Copy SHA256 Checksum SHA256||–||Source||May 25, 2017|