Reimplementation of the hicrep with added support for sparse matrix and multiple chromosomes.
Project description
hicreppy
cmdoret
This is a python reimplementation of hicrep's algorithm with added support for sparse matrices (in .cool format).
hicrep measures similarity between Hi-C samples by computing a stratum-adjusted correlation coefficient (SCC). In this implementation, the SCC is computed separately for each chromosome and the chromosome length-weighted average of SCCs is computed.
hicrep is published at:
HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Tao Yang, Feipeng Zhang, Galip Gurkan Yardimci, Ross C Hardison, William Stafford Noble, Feng Yue, Qunhua Li, 2017, Genome Research, doi: 10.1101/gr.220640.117
The original implementation, in R can be found at https://github.com/MonkeyLB/hicrep
Installation
You can install the package using pip:
pip install --user hicreppy
Usage
To find the optimal value for smoothing parameter h, you can use the htrain subcommand:
Usage: hicreppy htrain [OPTIONS] COOL1 COOL2
Find the optimal value for smoothing parameter h. The optimal h-value is
printed to stdout. Run informations are printed to stderr.
Options:
-r, --h-max INTEGER Maximum value of the smoothing parameter h to
explore. All consecutive integer values from 0 to
this value will be tested. [default: 10]
-m, --max-dist INTEGER Maximum distance at which to compute the SCC, in
basepairs. [default: 100000]
-b, --blacklist TEXT Exclude those chromosomes in the analysis. List of
comma-separated chromosome names.
-w, --whitelist TEXT Only include those chromosomes in the analysis. List
of comma-separated chromosome names.
--help Show this message and exit.
To compute the SCC between two matrices, use the scc subcommand. The optimal h value obtained with htrain should be provided to the flag -v:
Usage: hicreppy scc [OPTIONS] COOL1 COOL2
Compute the stratum-adjusted correlation coefficient for input matrices
Options:
-v, --h-value INTEGER Value of the smoothing parameter h to use. Should
be an integer value >= 0. [default: 10]
-m, --max-dist INTEGER Maximum distance at which to compute the SCC, in
basepairs. [default: 100000]
-s, --subsample INTEGER Subsample contacts from both matrices to target
value. Leave to 0 to disable subsampling.
[default: 0]
-b, --blacklist TEXT Exclude those chromosomes in the analysis. List of
comma-separated chromosome names.
-w, --whitelist TEXT Only include those chromosomes in the analysis.
List of comma-separated chromosome names.
--help Show this message and exit.
When running multiple pairwise comparisons, compute the optimal h value once between two highly similar samples and reuse the h value for all scc commands
Contributing
All contributions are welcome. We use the numpy standard for docstrings when documenting functions.
The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use pytest with the pytest-doctest and pytest-pylint plugins as our testing framework. Ideally, new functions should have associated unit tests, placed in the tests folder.
To test the code, you can run:
pytest --doctest-modules --pylint --pylint-error-types=EF --pylint-rcfile=.pylintrc hicreppy tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hicreppy-0.1.0.tar.gz.
File metadata
- Download URL: hicreppy-0.1.0.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c384a8166f7047b317d1e194d3c679f2da08d360117f03ec6ef9ce610df85d66
|
|
| MD5 |
a7448f4f43673b55a5328a09bce33c92
|
|
| BLAKE2b-256 |
32376d42aa1d6f4c092c53d60b2bc0e53342f5816652624b2c3e13b29f62f8f8
|
File details
Details for the file hicreppy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hicreppy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95402d33d078380e335fb5ecdad64f3136bbbf2de1f0ae168cdae3e7d3337524
|
|
| MD5 |
d5cede14dd162b0ea0bb418215d02c69
|
|
| BLAKE2b-256 |
6a7e0e91fd832bbc8768c43ba45a66748dfc8ef85c4feefed73e55567d56f1b0
|