Skip to main content

DiffDomain can test the significant difference of TADs on chromatin.

Project description

diffDomain

A short description

diffDomain is a new computational method for identifying reorganized TADs using chromatin contact maps from two biological conditions.

A long description diffDomain

The workflow of diffDomain is illustrated down below.

The goal is to test if a TAD identified in one biological condition has structural changes in another biological condition.

The core of diffDomain is formulating the problem as a hypothesis testing problem where the null hypothesis is that the TAD doesn’t undergo significant structural reorganization at later condition. The input are Hi-C contact matrices of the TAD region in the two biological conditions (A). The Hi-C contact matrices are log-transformed to adjust for the exponential decay of Hi-C contacts between chromosome bins with increased distances.

Their entry-wise difference is calculated (B).

The difference matrix D is normalized by iteratively standardizing its k-off diagonal parts, -N+2 <= k <= N-2, adjusting absolute differences in contact frequencies due to different sequencing depths in the two biological conditions (C).

Note that, standardization is TAD-specific. Each TAD has its own parameters that are only estimated from its contact matrices in a pair of biological conditions.

Intuitively, if a TAD is not significantly reorganized, normalized D would resemble a random matrix with white noise entries, enabling us to borrow theoretical results in random matrix theory. Indeed, normalized D is a generalized Wigner matrix (D), a well studied high-dimensional random matrices.

Its largest singular value is proved to be fluctuating around 2 under the null hypothesis. Armed with the fact, diffDomain reformulates the reorganized TAD identification problem into a hypothesis testing problem:
1. H0: the largest singular value equals to 2;
2. H1: the largest singular value is greater than 2.
For a user given set of TADs, P values are adjusted for multiple comparisons using BH method as default.
Once we identify the subset of reorganized TADs, we classify them into six subtypes to aid biological analysis and interpretations.

Installation instructions

diffDomain is tested on MacOS & Linux (Centos).

Dependences

diffDomain-py3 is dependent on - Python 3 - hic-straw 1.3.1 - TracyWidom - pandas - numpy - docopt -tqdm - matplotlib - statsmodels

Installation

Download diffDomain source package by running following command in a terminal:

git clone https://github.com/Tian-Dechao/diffDomain.git

or :

pip install diffDomain-py3

Get started with example usage

We downloaded data GEO:GSE63525 from Rao et al (2014) for standalone example usage of diffDomain. Example data saved in <data/>: 1. GM12878 TADs. 2. GM12878 combined Hi-C data on Chr1 that is extracted by Juicebox with resolution at 10 kb and normalization method at KR. The produced Hi-C data is 3-column: column 1 and column 2 are chromosomal bins, column 3 is KR normalized contact frequencies between the two bins. 3. K562 combined Hi-C data on Chr1. Settings are the same as GM12878.

Testing if one TAD is reorganized

In this example, we tested the GM12878 TAD that is reorganized in K562 (Chr1:163500000-165000000, Ref). Data are saved in <data/single-TAD/>.

Running the command

  • Usage: scriptname dvsd one <chr> <start> <end> <hic0> <hic1> [options]

python diffdomain/diffdomains.py dvsd one 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/chr1_163500000_165000000.txt

diffDomain also provide visualization function to visualize Hi-C matrices side-by-side.

  • Usage: scriptname visualization <chr> <start> <end> <hic0> <hic1> [options]

Figure are saved in <res/images/>.

python diffdomain/diffdomains.py visualization 1 163500000 165000000 data/single-TAD/GM12878_chr1_163500000_165000000_res_10k.txt data/single-TAD/K562_chr1_163500000_165000000_res_10k.txt --reso 10000 --ofile res/images/side_by_side

Note: in this example, there is no need to do multiple comparison adjustment. Multiple comparisons adjustment by BH will be demonstrated in the next example.

Identifying the reorganized TADs on a 50 Mb region (Chr1:1-50,000,000)

In this example, multiple comparison adjustment is requried to adjust the P-values. chr1_50M_domainlist are saved in <data/TADs_chr1/>.

  • Usage: scriptname dvsd multiple <hic0> <hic1> <bed> [options]

python diffdomain/diffdomains.py dvsd multiple data/TADs_chr1/chr1_50M_GM12878.h5 data/TADs_chr1/chr1_50M_K562.h5 data/TADs_chr1/GM12878_chr1_50M_domainlist.txt --reso 10000 --ofile res/temp/GM12878_vs_K562_chr1_50M_temp.txt

The function pydiff.diffdomain_multiple will return the dataframe of result_mul.

  • Adjusting multiple comparisons by BH method (default, Optional parameters: fdr_by, bonferroni, holm, hommel etc.) and Filtering out reorganized TADs with BH < 0.05

  • Usage: scriptname adjustment <method> <input> <output>

python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/GM12878_vs_K562_chr1_50M_adjusted_filter.tsv --filter true

For interactive integrative analysis, we recommend using the Nucleome Browser. Example visualization outputs are shown below.

reorganized TADs on chr1

reorganized TADs on chr1

Identifying GM12878 TADs that are reorganized in K562, using all TADs.

Data is using Amazon.

  • Identify TADs in multiple chromosomes simultaneously.

python diffdomain/diffdomains.py dvsd multiple https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic https://hicfiles.s3.amazonaws.com/hiseq/k562/in-situ/combined.hic data/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist.txt --ofile res/temp/temp.txt
  • MultiComparison adjustment.

python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/adjusted_TADs2.txt
  • optional parameter [–filter], Filtering out reorganized TADs with BH < 0.05.

python diffdomain/diffdomains.py adjustment fdr_bh res/temp/GM12878_vs_K562_chr1_50M_temp.txt res/reorganized_TADs_GM12878_K562.tsv --filter true

The final output is saved to <res/reorganized_TADs_GM12878_K562.tsv>.

  • Classification of TADs

Running the command:

python diffdomain/classificattion.py -d adjusted_TADs2.txt -t GSE63525_K562_Arrowhead_domainlist.txt

Contact information

More information please contact Dunming Hua at huadm@mail2.sysu.edu.cn, Ming Gu at guming5@mail2.sysu.edu.cn or Dechao Tian at tiandch@mail.sysu.edu.cn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffDomain-py3-0.2.2.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffDomain_py3-0.2.2-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file diffDomain-py3-0.2.2.tar.gz.

File metadata

  • Download URL: diffDomain-py3-0.2.2.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for diffDomain-py3-0.2.2.tar.gz
Algorithm Hash digest
SHA256 3b32dbc819e4f9b815fac922c5e817a3632eb24af98c6eeca1e4d0d1f37766b7
MD5 f685223e7a623ad6c2ec8d42a60d4f4a
BLAKE2b-256 fae5a00c7f511ec39120bdc53a17a1db44d8d016a69b716501ff66917bf4da37

See more details on using hashes here.

File details

Details for the file diffDomain_py3-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: diffDomain_py3-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for diffDomain_py3-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 faf35b88c83a29fdb47fa3c5e446092226bf8ffe5baa9fcce564ccd0ab79c97e
MD5 e7cbb994d8a82fe9bcc9c920c9bf6034
BLAKE2b-256 6d1571955bfc749c8375157934761d331d7b0f8e7321b1226c08d4554ce10d77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page