Detect loops (and other patterns) in Hi-C contact maps.
Project description
Chromosight
Python package to detect chromatin loops (and other patterns) in Hi-C contact maps.
Preprint can be found on https://www.biorxiv.org/content/10.1101/2020.03.08.981910v3.full
Docs available at https://chromosight.readthedocs.io
Examples of analyses can be found at https://github.com/koszullab/chromosight_analyses_scripts
Installation
Stable version with pip:
pip3 install --user chromosight
Stable version with conda:
conda install -c bioconda -c conda-forge chromosight
or, if you want to get the latest development version:
pip3 install --user -e git+https://github.com/koszullab/chromosight.git@master#egg=chromosight
Usage
chromosight
has 3 subcommands: detect
, quantify
and generate-config
. To get the list and description of those subcommands, you can always run:
chromosight --help
Pattern detection is done using the detect
subcommand. The generate-config subcommand is used to create a new type of pattern that can then be fed to detect
using the --custom-kernel
option. The quantify
subcommand is used to compute pattern matching scores for a list of 2D coordinates on a Hi-C matrix.
Get started
To get a first look at a chromosight run, you can run chromosight test
, which will download a test dataset from the github repository and run chromosight detect
on it.
Important options
--min-dist
: Minimum distance from which to detect patterns.--max-dist
: Maximum distance from which to detect patterns. Increasing also increases runtime and memory use.--pearson
: Decrease to allow a greater number of pattern detected (with potentially more false positives).--perc-zero
: Proportion of zero pixels allowed in a window for detection. If you have low coverage, increasing this value may improve results.
Example
To detect all chromosome loops with sizes between 2kb and 200kb using 8 parallel threads:
chromosight detect --threads 8 --min-dist 20000 --max-dist 200000 hic_data.cool output_prefix
Options
Pattern exploration and detection
Explore and detect patterns (loops, borders, centromeres, etc.) in Hi-C contact
maps with pattern matching.
Usage:
chromosight detect [--kernel-config=FILE] [--pattern=loops]
[--pearson=auto] [--win-size=auto] [--iterations=auto]
[--win-fmt={json,npy}] [--force-norm]
[--subsample=no] [--inter] [--tsvd] [--smooth-trend]
[--n-mads=5] [--min-dist=0] [--max-dist=auto]
[--no-plotting] [--min-separation=auto] [--dump=DIR]
[--threads=1] [--perc-zero=auto]
[--perc-undetected=auto] <contact_map> <prefix>
chromosight generate-config [--preset loops] [--click contact_map]
[--force-norm] [--win-size=auto] [--n-mads=5]
[--threads=1] <prefix>
chromosight quantify [--inter] [--pattern=loops] [--subsample=no]
[--win-fmt=json] [--kernel-config=FILE] [--force-norm]
[--threads=1] [--n-mads=5] [--win-size=auto]
[--perc-undetected=auto] [--perc-zero=auto]
[--no-plotting] [--tsvd] <bed2d> <contact_map> <prefix>
chromosight test
detect:
performs pattern detection on a Hi-C contact map via template matching
generate-config:
Generate pre-filled config files to use for detect and quantify.
A config consists of a JSON file describing parameters for the
analysis and path pointing to kernel matrices files. Those matrices
files are tsv files with numeric values as kernel to use for
convolution.
quantify:
Given a list of pairs of positions and a contact map, computes the
correlation coefficients between those positions and the kernel of the
selected pattern.
test:
Download example data and run loop detection on it.
Input
Input Hi-C contact maps should be in cool format. The cool format is an efficient and compact format for Hi-C data based on HDF5. It is maintained by the Mirny lab and documented here: https://mirnylab.github.io/cooler/
Most other Hi-C data formats (hic, homer, hic-pro), can be converted to cool using hicexplorer's hicConvertFormat. Bedgraph2 format can be converted directly using cooler with the command cooler load -f bg2 <chrom.sizes>:<binsize> in.bg2.gz out.cool
. For more informations, see the cooler documentation
For chromosight quantify
, the bed2d file is a text file with at least 6 tab-separated columns containing pairs of coordinates. The first 6 columns should be chrom start end chrom start end
and have no header. Alternatively, the output text file generated by chromosight detect
is also accepted. Instructions to generate a bed2d file from a bed file are given in the documentation.
Output
Three files are generated by chromosight's detect
and quantify
commands. Their filenames are determined by the value of the <prefix>
argument:
prefix.tsv
: List of genomic coordinates, bin ids and correlation scores for the pattern identifiedprefix.json
: JSON file containing the windows (of the same size as the kernel used) around the patterns from pattern.txtprefix.pdf
: Plot showing the pileup (average) window of all detected patterns. Plot generation can be disabled using the--no-plotting
option.
Alternatively, one can set the --win-fmt=npy
option to dump windows into a npy file instead of JSON. This format can easily be loaded into a 3D array using numpy's np.load
function.
Contributing
All contributions are welcome. We use the numpy standard for docstrings when documenting functions.
The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use nose2
as our testing framework. Ideally, new functions should have associated unit tests, placed in the tests
folder.
To test the code, you can run:
nose2 -s tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chromosight-1.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a6cae531b2922def674225c885476ca8adb3c8dbd4374739064d77594295af6 |
|
MD5 | dbe7e9057e9dae1265c0b23d1b058f76 |
|
BLAKE2b-256 | 1aa37eb69cf97f8cb2dfd13adefa0b7129f86df5ceb8647d8d0c5fda9eac285f |