DiffDomain can test the significant difference of TADs.
Project description
diffDomain-py3
A short description
DiffDomain is a new computational method for identifying reorganized TADs using chromatin contact maps from two biological conditions.
A long description diffDomain
The workflow of diffDomain is illustrated down below.
The goal is to test if a TAD identified in one biological condition has structural changes in another biological condition.
The core of diffDomain is formulating the problem as a hypothesis testing problem where the null hypothesis is that the TAD doesn’t undergo significant structural reorganization at later condition. The input are Hi-C contact matrices of the TAD region in the two biological conditions (A). The Hi-C contact matrices are log-transformed to adjust for the exponential decay of Hi-C contacts between chromosome bins with increased distances.
Their entry-wise difference is calculated (B).
The difference matrix D is normalized by iteratively standardizing its k-off diagonal parts, -N+2 <= k <= N-2, adjusting absolute differences in contact frequencies due to different sequencing depths in the two biological conditions (C).
Note that, standardization is TAD-specific. Each TAD has its own parameters that are only estimated from its contact matrices in a pair of biological conditions.
Intuitively, if a TAD is not significantly reorganized, normalized D would resemble a random matrix with white noise entries, enabling us to borrow theoretical results in random matrix theory. Indeed, normalized D is a generalized Wigner matrix (D), a well studied high-dimensional random matrices.
Installation instructions
diffDomain is tested on MacOS & Linux (Centos).
Dependences
diffDomain_py3 is dependent on - Python3 - hic-straw 1.3.1 - TracyWidom - pandas - numpy - docopt - matplotlib - statsmodels
Installation
Download diffDomain source package by running following command in a terminal:
git clone https://github.com/Tian-Dechao/diffDomain.git
or :
pip install diffDomain-py3
Get started with example usage
We downloaded data GEO:GSE63525 from Rao et al (2014) for standalone example usage of diffDomain. Example data saved in <data/>: 1. GM12878 TADs. 2. GM12878 combined Hi-C data on Chr1 that is extracted by Juicebox with resolution at 10 kb and normalization method at KR. The produced Hi-C data is 3-column: column 1 and column 2 are chromosomal bins, column 3 is KR normalized contact frequencies between the two bins. 3. K562 combined Hi-C data on Chr1. Settings are the same as GM12878.
Testing if one TAD is reorganized
In this example, we tested the GM12878 TAD that is reorganized in K562 (Chr1:163500000-165000000, Ref). Data are saved in <data/single-TAD/>.
Running the command
Usage: scriptname dvsd one <chr> <start> <end> <hic0> <hic1> [options]
python diffdomain/diffdomains.py dvsd one 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/chr1_163500000_165000000.txt
diffDomain also provide visualization function to visualize Hi-C matrices side-by-side.
Usage: scriptname visualization <chr> <start> <end> <hic0> <hic1> [options]
Figure are saved in <res/images/>.
python diffdomain/diffdomains.py visualization 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/images/side_by_side
Note: in this example, there is no need to do multiple comparison adjustment. Multiple comparisons adjustment by BH will be demonstrated in the next example.
Identifying the reorganized TADs on a 50 Mb region (Chr1:1-50,000,000)
In this example, multiple comparison adjustment is requried to adjust the P-values. chr1_50M_domainlist are saved in <data/TADs_chr1/>.
Usage: scriptname dvsd multiple <hic0> <hic1> <bed> [options]
python diffdomain/diffdomains.py dvsd multiple data/TADs_chr1/chr1_50M_GM12878.h5 data/TADs_chr1/chr1_50M_K562.h5 data/TADs_chr1/GM12878_chr1_50M_domainlist.txt --reso 10000 --ofile res/temp/GM12878_vs_K562_chr1_50M_temp.txt
The function pydiff.diffdomain_multiple will return the dataframe of result_mul.
Adjusting multiple comparisons by BH method (default, Optional parameters: fdr_by, bonferroni, holm, hommel etc.) and Filtering out reorganized TADs with BH < 0.05
Usage: scriptname adjustment <method> <input> <output>
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/GM12878_vs_K562_chr1_50M_adjusted_filter.tsv --filter true
For interactive integrative analysis, we recommend using the Nucleome Browser. Identifying GM12878 TADs that are reorganized in K562, using all TADs. ———————————————————————-
Data is using Amazon (from Aiden Lab).
Identify TADs in multiple chromosomes simultaneously.
python diffdomain/diffdomains.py dvsd multiple https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined.hic data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt --ofile res/temp/temp.txt
MultiComparison adjustment.
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/adjusted_TADs2.txt
optional parameter [–filter], Filtering out reorganized TADs with BH < 0.05.
python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/reorganized_TADs_GM12878_K562.tsv --filter true
The final output is saved to <res/reorganized_TADs_GM12878_K562.tsv>.
Classification of TADs
Running the command:
python diffdomain/classificattion.py -d adjusted_TADs2.txt -t GSE63525_K562_Arrowhead_domainlist.txt
Note: You can set the -l(–limit) to adjust the ‘common boundary’. As said in paper,we use ‘3bin’ as the filter of common bounday. That means if we use the 10kb resolution, we will set -l as 30000, and if 25kb resolution, -l will be 75000.
python diffdomain/classificattion.py -d adjusted_TADs2.txt\ -t GSE63525_K562_Arrowhead_domainlist.txt\ -l 30000
Contact information
More information please contact Dunming Hua at huadm@mail2.sysu.edu.cn, Ming Gu at guming5@mail2.sysu.edu.cn or Dechao Tian at tiandch@mail.sysu.edu.cn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for diffDomain_py3-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e31d95e438317ff5bf33b2c5dbe5b8e6b416cbc2a7ecc1e02853e00bae2306af |
|
MD5 | b66ef8972ef3fd434b14a0a888a7d202 |
|
BLAKE2b-256 | c82d4e94c72b702bef5d9012d3632cdc2673f55f0d13efc9409b28d1b2f4e179 |