SuperTLD: Detecting TAD-like domains from RNA-associated interactions
Project description
SuperTLD
SuperTLD is a novel method proposed to infer the hierarchical structure of TAD-like domains (TLDs) from RNA-associated interactions (RAIs).
SuperTLD comprises the data imputation and hierarchical domain detection.
SuperTLD supports RAI's asymmetric or symmetric contact map as input.
Users can test the TAD inference potentiality from the integration of Hi-C and RAIs via SuperTAD.
We also provide the scripts of evaluation performed in the paper.
Requirements
- python3
- numpy
- pandas
- scipy
- SuperTAD v1.2 (https://github.com/deepomicslab/SuperTAD)
Installation
Run the following from a terminal
pip install supertld
Instructions
In this section, we show how to run SuperTLD with the example data.
Data preparation
The example data can be downloaded from the zenode repository.
Download the example_data.zip and uncompress the example_data.zip into the directory of ./data
.
Run SuperTLD
import supertld
import numpy as np
# load the RAI contact map and declare its chromosome and resolution
raiMap = np.loadtxt("./data/iMARGI_chr22_RAI_matrix_100kb.txt")
raiChrom="chr22"
raiResolution=100000
outputFile = "./data/iMARGI_chr22_result.txt" # declare the path of output file
# create a SuperTLD object
model = supertld.SupertadSparse(chrom=raiChrom, resolution=raiResolution, supertad=<executing_path_of_SuperTAD>)
# run SuperTLD on RAI contact map to infer TLDs
_, TLDresult = model.pipeline(inputMatrix=raiMap, outpath=outputFile)
# perform evaluation
# load Hi-C, TF ChIP-seq, and histone ChIP-seq data
hicMap = "./data/HEK293T_chr22_100KR_matrix.txt"
bed = ["./data/CTCF_ENCFF206AQV.bed"]
bedgraph = ["./data/H3K27ME3_hg38_GSM3907592.bedgraph", "./data/H3K36me3_hg38_ENCSR910LIE.bedgraph"]
# create the evaluate object
evaluateModel = supertld.Evaluate(chrom=raiChrom, resolution=raiResolution, hicPath=hicMap, bed=bed, bedgraph=bedgraph)
# run the evaluation
evaluateResult = evaluateModel.run(resultList=[outputFile], outPath=outputFile+".evaluateResult.txt")
# run SuperTLD on integration of RAIs and Hi-C
# test integrated data where alpha ranges from 0 to 1
matrixLists, TLDresults = model.pipeline(inputMatrix=raiMap, outpath=outputFile, hic=hicMap, alpha=None)
# run evaluation on all the integrated data
evaluateResult = evaluateModel.run(resultList=matrixLists, outPath=outputFile+".allAlpha_evaluateResult.txt")
Parameters
model = SupertadSparse(chrom="chr1", resolution=100000, supertad=None)
- chrom: string, required, default: chr1
The chromosome of the input RAI interaction map. - resolution: int, required, default: 100000
The bin resolution of the input RAI interaction map. - supertad: string, required
The executing path of SuperTAD.
model.pipeline(inputMatrix=None, outpath="./norm_matrix.txt", hic=None, alpha=None)
- inputMatrix: 2darray, required
The RAI interaction map. - outpath: string, optional, default: "./norm_matrix.txt"
The output path of SuperTLD. - hic: string, optional
If the file path of Hi-C contact map is given, perform the integration of RAI and HI-C. - alpha: float, optional, default: None If hic parameter is given, alpha is needed to control the integration. User can provide a value ranging from 0 to 1. The default None gives all the integration result (set alpha from 0 to 1 by 0.05).
evaluateModel = Evaluate(chrom="chr1", resolution=100000, hicPath=None, bed=None, bedgraph=None)
- chrom: string, required, default: chr1
The chromosome of the result. - resolution: int, required, default: 100000
The bin resolution of the RAI interaction map. - hicPath: string, required
The path of corresponding Hi-C contact map. - bed: list, optional
The list of TF ChIP-seq data (.bed). - bedgraph: list, optional
The list of H3K27me3/H3K36me3 ChIP-seq data (.bedgraph)
evaluateModel(resultList=None, outPath="./")
- resultList: list, required
The list of result matrix. - outPath: string, optional, default: "./"
The output path of evaluation result.
The result of inferred TLDs are suffixed with .multi2D_AllH2_sparse.tsv
.
For the evaluation result, the first row is the result of TLDs, and the second row is the result of TLDs.
The first column is the PCC of contact map (compared with Hi-C), the second column is the PCC of distance decay,
the third and fourth column are OR and NMI respectively,
the fifth and sixth are the CTCF fold change and its pvalue, and the seventh column is the percentage of domains enriched in H3K27me3/H3K36me3 marks.
Contact
Feel free to open an issue in Github or contact yuwzhang7-c@my.cityu.edu.hk
if you have any problem in using SuperTLD.
License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file supertld-0.0.2.tar.gz
.
File metadata
- Download URL: supertld-0.0.2.tar.gz
- Upload date:
- Size: 34.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.28.1 importlib-metadata/4.11.2 keyring/17.0.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9aa62c58957d309b63dcd78793710dc5c2bb005b63b3dfe16e70066c98d8361c |
|
MD5 | c61b53ac384ddea668a87b828f113c14 |
|
BLAKE2b-256 | ff6cd6d6ac976ce661766d01115535ee202abb646861c4e7511cceb40770581f |