DiffDomain can test the significant difference of TADs.
Project description
diffDomain
A short description
diffDomain is a new computational method for identifying reorganized TADs using chromatin contact maps from two biological conditions.
A long description diffDomain
The workflow of diffDomain is illustrated down below.
The goal is to test if a TAD identified in one biological condition has structural changes in another biological condition.
The core of diffDomain is formulating the problem as a hypothesis testing problem where the null hypothesis is that the TAD doesn’t undergo significant structural reorganization at later condition. The input are Hi-C contact matrices of the TAD region in the two biological conditions (A). The Hi-C contact matrices are log-transformed to adjust for the exponential decay of Hi-C contacts between chromosome bins with increased distances.
Their entry-wise difference is calculated (B).
The difference matrix D is normalized by iteratively standardizing its k-off diagonal parts, -N+2 <= k <= N-2, adjusting absolute differences in contact frequencies due to different sequencing depths in the two biological conditions (C).
Note that, standardization is TAD-specific. Each TAD has its own parameters that are only estimated from its contact matrices in a pair of biological conditions.
Intuitively, if a TAD is not significantly reorganized, normalized D would resemble a random matrix with white noise entries, enabling us to borrow theoretical results in random matrix theory. Indeed, normalized D is a generalized Wigner matrix (D), a well studied high-dimensional random matrices.
Installation instructions
diffDomain is tested on MacOS & Linux (Centos).
Dependences
diffDomain-py3 is dependent on - Python 3 - hic-straw 1.3.1 - TracyWidom - pandas - numpy - docopt - matplotlib - statsmodels
Installation
Download diffDomain source package by running following command in a terminal:
git clone https://github.com/Tian-Dechao/diffDomain.git
or :
pip install diffDomain-py3
Get started with example usage
We downloaded data GEO:GSE63525 from Rao et al (2014) for standalone example usage of diffDomain. Example data saved in <data/>: 1. GM12878 TADs. 2. GM12878 combined Hi-C data on Chr1 that is extracted by Juicebox with resolution at 10 kb and normalization method at KR. The produced Hi-C data is 3-column: column 1 and column 2 are chromosomal bins, column 3 is KR normalized contact frequencies between the two bins. 3. K562 combined Hi-C data on Chr1. Settings are the same as GM12878.
Testing if one TAD is reorganized
In this example, we tested the GM12878 TAD that is reorganized in K562 (Chr1:163500000-165000000, Ref). Data are saved in <data/single-TAD/>.
Running the command
Usage: scriptname dvsd one <chr> <start> <end> <hic0> <hic1> [options]
python diffdomain/diffdomains.py dvsd one 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/chr1_163500000_165000000.txt
diffDomain also provide visualization function to visualize Hi-C matrices side-by-side.
Usage: scriptname visualization <chr> <start> <end> <hic0> <hic1> [options]
Figure are saved in <res/images/>.
python diffdomain/diffdomains.py visualization 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/images/side_by_side
Note: in this example, there is no need to do multiple comparison adjustment. Multiple comparisons adjustment by BH will be demonstrated in the next example.
Identifying the reorganized TADs on a 50 Mb region (Chr1:1-50,000,000)
In this example, multiple comparison adjustment is requried to adjust the P-values. chr1_50M_domainlist are saved in <data/TADs_chr1/>.
Usage: scriptname dvsd multiple <hic0> <hic1> <bed> [options]
python diffdomain/diffdomains.py dvsd multiple data/TADs_chr1/chr1_50M_GM12878.h5 data/TADs_chr1/chr1_50M_K562.h5 data/TADs_chr1/GM12878_chr1_50M_domainlist.txt --reso 10000 --ofile res/temp/GM12878_vs_K562_chr1_50M_temp.txt
The function pydiff.diffdomain_multiple will return the dataframe of result_mul.
Adjusting multiple comparisons by BH method (default, Optional parameters: fdr_by, bonferroni, holm, hommel etc.) and Filtering out reorganized TADs with BH < 0.05
Usage: scriptname adjustment <method> <input> <output>
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/GM12878_vs_K562_chr1_50M_adjusted_filter.tsv --filter true
For interactive integrative analysis, we recommend using the Nucleome Browser. Example visualization outputs are shown below.
Identifying GM12878 TADs that are reorganized in K562, using all TADs.
Data is using Amazon.
Identify TADs in multiple chromosomes simultaneously.
python diffdomain/diffdomains.py dvsd multiple https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined.hic data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt --ofile res/temp/temp.txt
MultiComparison adjustment.
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/adjusted_TADs2.txt
optional parameter [–filter], Filtering out reorganized TADs with BH < 0.05.
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/reorganized_TADs_GM12878_K562.tsv --filter true
The final output is saved to <res/reorganized_TADs_GM12878_K562.tsv>.
Classification of TADs
Running the command:
python diffdomain/classificattion.py -d adjusted_TADs2.txt -t GSE63525_K562_Arrowhead_domainlist.txt
Contact information
More information please contact Dunming Hua at huadm@mail2.sysu.edu.cn, Ming Gu at guming5@mail2.sysu.edu.cn or Dechao Tian at tiandch@mail.sysu.edu.cn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for diffDomain_py3-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eed7ed3d87d9a936c788c652b8a2b74859ca0223beb82b6082a264097904ff9 |
|
MD5 | 2fbae7f120eb1170c84f428df26c3a80 |
|
BLAKE2b-256 | 2d2514997cc892a3b2c5c91fbe699ee964a9e81aa207e5ccabdea6320739a76a |