Skip to main content

Detect loops (and other patterns) in Hi-C contact maps.

Project description

Chromosight

animated logo

PyPI version install with bioconda build Docker Image on Quay codecov Read the docs License: GPLv3 Language grade: Python

Python package to detect chromatin loops (and other patterns) in Hi-C contact maps.

Installation

Stable version with pip:

pip3 install --user chromosight

Stable version with conda:

conda install -c bioconda -c conda-forge chromosight

or, if you want to get the latest development version:

pip3 install --user -e git+https://github.com/koszullab/chromosight.git@master#egg=chromosight

Usage

The two main subcommands of chromosight are detect and quantify. For more advanced use, there are two additional subcomands: generate-config and list-kernels. To get the list and description of those subcommands, you can always run:

chromosight --help

Pattern detection is done using the detect subcommand. The quantify subcommand is used to compute pattern matching scores for a list of 2D coordinates on a Hi-C matrix. The generate-config subcommand is used to create a new type of pattern that can then be fed to detect using the --custom-kernel option. The list-kernels command is used to view informations about the available patterns.

Get started

To get a first look at a chromosight run, you can run chromosight test, which will download a test dataset from the github repository and run chromosight detect on it. You can then have a look at the output files generated.

Important options

When running chromosight detect, there are a handful parameters which are especially important:

  • --min-dist: Minimum genomic distance from which to detect patterns. For loops, this means the smallest loop size accepted (i.e. distance between the two anchors).
  • --max-dist: Maximum genomic distance from which to detect patterns. Increasing also increases runtime and memory use.
  • --pearson: Detection threshold. Decrease to allow a greater number of pattern detected (with potentially more false positives). Setting a very low value may actually reduce the number of detected patterns. This is due to the algorithm which might merge neighbouring patterns.
  • --perc-zero: Proportion of zero pixels allowed in a window for detection. If you have low coverage, increasing this value may improve results.

Example

To detect all chromosome loops with sizes between 2kb and 200kb using 8 parallel threads:

chromosight detect --threads 8 --min-dist 20000 --max-dist 200000 hic_data.cool output_prefix

Options


Pattern exploration and detection

Explore and detect patterns (loops, borders, centromeres, etc.) in Hi-C contact
maps with pattern matching.

Usage:
    chromosight detect  [--kernel-config=FILE] [--pattern=loops]
                        [--pearson=auto] [--win-size=auto] [--iterations=auto]
                        [--win-fmt={json,npy}] [--norm={auto,raw,force}]
                        [--subsample=no] [--inter] [--tsvd] [--smooth-trend]
                        [--n-mads=5] [--min-dist=0] [--max-dist=auto]
                        [--no-plotting] [--min-separation=auto] [--dump=DIR]
                        [--threads=1] [--perc-zero=auto]
                        [--perc-undetected=auto] <contact_map> <prefix>
    chromosight generate-config [--preset loops] [--click contact_map]
                        [--norm={auto,raw,norm}] [--win-size=auto] [--n-mads=5]
                        [--threads=1] <prefix>
    chromosight quantify [--inter] [--pattern=loops] [--subsample=no]
                         [--win-fmt=json] [--kernel-config=FILE] [--norm={auto,raw,norm}]
                         [--threads=1] [--n-mads=5] [--win-size=auto]
                         [--perc-undetected=auto] [--perc-zero=auto]
                         [--no-plotting] [--tsvd] <bed2d> <contact_map> <prefix>
    chromosight list-kernels [--long] [--mat] [--name=kernel_name]
    chromosight test

    detect:
        performs pattern detection on a Hi-C contact map via template matching
    generate-config:
        Generate pre-filled config files to use for detect and quantify.
        A config consists of a JSON file describing parameters for the
        analysis and path pointing to kernel matrices files. Those matrices
        files are tsv files with numeric values as kernel to use for
        convolution.
    quantify:
        Given a list of pairs of positions and a contact map, computes the
        correlation coefficients between those positions and the kernel of the
        selected pattern.
    list-kernels:
        Prints information about available kernels.
    test:
        Download example data and run loop detection on it.

Input

Input Hi-C contact maps should be in cool format. The cool format is an efficient and compact format for Hi-C data based on HDF5. It is maintained by the Mirny lab and documented here: https://open2c.github.io/cooler/

Most other Hi-C data formats (hic, homer, hic-pro), can be converted to cool using hicexplorer's hicConvertFormat or hic2cool. Bedgraph2 format can be converted directly using cooler with the command cooler load -f bg2 <chrom.sizes>:<binsize> in.bg2.gz out.cool. For more informations, see the cooler documentation

For chromosight quantify, the bed2d file is a text file with at least 6 tab-separated columns containing pairs of coordinates. The first 6 columns should be chrom start end chrom start end and have no header. Alternatively, the output text file generated by chromosight detect is also accepted. Instructions to generate a bed2d file from a bed file are given in the documentation.

Output

Three files are generated by chromosight's detect and quantify commands. Their filenames are determined by the value of the <prefix> argument:

  • prefix.tsv: List of genomic coordinates, bin ids and correlation scores for the pattern identified
  • prefix.json: JSON file containing the windows (of the same size as the kernel used) around the patterns from pattern.txt
  • prefix.pdf: Plot showing the pileup (average) window of all detected patterns. Plot generation can be disabled using the --no-plotting option.

Alternatively, one can set the --win-fmt=npy option to dump windows into a npy file instead of JSON. This format can easily be loaded into a 3D array using numpy's np.load function.

Note: the p-values and q-values provided in prefix.tsv should not be used as a criterion for filtering and are only useful for ranking calls. Their values are obtained from a Pearson correlation test and could be biased due to the dependence between contact values in the window.

Contributing

All contributions are welcome. We use the numpy standard for docstrings when documenting functions.

The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use nose2 as our testing framework. Ideally, new functions should have associated unit tests, placed in the tests folder.

To test the code, you can run:

nose2 -s tests/

FAQ

Questions from previous users are available in the github issues. You can open a new issue for your question if it is not already covered.

Citation

When using Chromosight in you research, please cite the pubication: https://www.nature.com/articles/s41467-020-19562-7

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromosight-1.6.3.tar.gz (142.0 kB view details)

Uploaded Source

Built Distribution

chromosight-1.6.3-py3-none-any.whl (144.8 kB view details)

Uploaded Python 3

File details

Details for the file chromosight-1.6.3.tar.gz.

File metadata

  • Download URL: chromosight-1.6.3.tar.gz
  • Upload date:
  • Size: 142.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for chromosight-1.6.3.tar.gz
Algorithm Hash digest
SHA256 4730152a72accadb90753c415dbe055d6ed48d3b9b1a01c74ac4b41eda6d270e
MD5 0a53b62982e00a551ca379489d1914b0
BLAKE2b-256 e5a6302010f5ec174023ae984e5cb76413a7cb7129a19f8b7e503699da14c52a

See more details on using hashes here.

File details

Details for the file chromosight-1.6.3-py3-none-any.whl.

File metadata

  • Download URL: chromosight-1.6.3-py3-none-any.whl
  • Upload date:
  • Size: 144.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for chromosight-1.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c4115d94f2ca4b9768f0cf4e5ffc1bae0dc774779baea04f6c845080e55f4e53
MD5 2a85e5e8b27d901dd3b92d3e603ad8c3
BLAKE2b-256 780dcdf1098a793fe9aaa1771cd95056a2229d875836aab3ca4be369b93f8d91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page