Skip to main content

SuperTLD: Detecting TAD-like domains from RNA-associated interactions

Project description

SuperTLD

Flow_chart_of_SuperTLD

SuperTLD is a novel method proposed to infer the hierarchical structure of TAD-like domains (TLDs) from RNA-associated interactions (RAIs). SuperTLD comprises the data imputation and hierarchical domain detection. SuperTLD supports RAI's asymmetric or symmetric contact map as input. Users can test the TAD inference potentiality from the integration of Hi-C and RAIs via SuperTAD.
We also provide the scripts of evaluation performed in the paper.

Requirements

Installation

Run the following from a terminal

pip install supertld

Instructions

In this section, we show how to run SuperTLD with the example data.

Data preparation

The example data can be downloaded from the zenode repository. Download the example_data.zip and uncompress the example_data.zip into the directory of ./data.

Run SuperTLD

import supertld
import numpy as np

# load the RAI contact map and declare its chromosome and resolution
raiMap = np.loadtxt("./data/iMARGI_chr22_RAI_matrix_100kb.txt")
raiChrom="chr22"
raiResolution=100000
outputFile = "./data/iMARGI_chr22_result.txt"   # declare the path of output file

# create a SuperTLD object 
model = supertld.SupertadSparse(chrom=raiChrom, resolution=raiResolution, supertad=<executing_path_of_SuperTAD>)

# run SuperTLD on RAI contact map to infer TLDs
_, TLDresult = model.pipeline(inputMatrix=raiMap, outpath=outputFile)

# perform evaluation
# load Hi-C, TF ChIP-seq, and histone ChIP-seq data
hicMap = "./data/HEK293T_chr22_100KR_matrix.txt"    
bed = ["./data/CTCF_ENCFF206AQV.bed"]   
bedgraph = ["./data/H3K27ME3_hg38_GSM3907592.bedgraph", "./data/H3K36me3_hg38_ENCSR910LIE.bedgraph"]
# create the evaluate object
evaluateModel = supertld.Evaluate(chrom=raiChrom, resolution=raiResolution, hicPath=hicMap, bed=bed, bedgraph=bedgraph)
# run the evaluation 
evaluateResult = evaluateModel.run(resultList=[outputFile], outPath=outputFile+".evaluateResult.txt")


# run SuperTLD on integration of RAIs and Hi-C
# test integrated data where alpha ranges from 0 to 1
matrixLists, TLDresults = model.pipeline(inputMatrix=raiMap, outpath=outputFile, hic=hicMap, alpha=None)

# run evaluation on all the integrated data
evaluateResult = evaluateModel.run(resultList=matrixLists, outPath=outputFile+".allAlpha_evaluateResult.txt")

Parameters

model = SupertadSparse(chrom="chr1", resolution=100000, supertad=None)
  • chrom: string, required, default: chr1
    The chromosome of the input RAI interaction map.
  • resolution: int, required, default: 100000
    The bin resolution of the input RAI interaction map.
  • supertad: string, required
    The executing path of SuperTAD.
model.pipeline(inputMatrix=None, outpath="./norm_matrix.txt", hic=None, alpha=None)
  • inputMatrix: 2darray, required
    The RAI interaction map.
  • outpath: string, optional, default: "./norm_matrix.txt"
    The output path of SuperTLD.
  • hic: string, optional
    If the file path of Hi-C contact map is given, perform the integration of RAI and HI-C.
  • alpha: float, optional, default: None If hic parameter is given, alpha is needed to control the integration. User can provide a value ranging from 0 to 1. The default None gives all the integration result (set alpha from 0 to 1 by 0.05).
evaluateModel = Evaluate(chrom="chr1", resolution=100000, hicPath=None, bed=None, bedgraph=None)
  • chrom: string, required, default: chr1
    The chromosome of the result.
  • resolution: int, required, default: 100000
    The bin resolution of the RAI interaction map.
  • hicPath: string, required
    The path of corresponding Hi-C contact map.
  • bed: list, optional
    The list of TF ChIP-seq data (.bed).
  • bedgraph: list, optional
    The list of H3K27me3/H3K36me3 ChIP-seq data (.bedgraph)
evaluateModel(resultList=None, outPath="./")
  • resultList: list, required
    The list of result matrix.
  • outPath: string, optional, default: "./"
    The output path of evaluation result.

The result of inferred TLDs are suffixed with .multi2D_AllH2_sparse.tsv.
For the evaluation result, the first row is the result of TLDs, and the second row is the result of TLDs. The first column is the PCC of contact map (compared with Hi-C), the second column is the PCC of distance decay, the third and fourth column are OR and NMI respectively, the fifth and sixth are the CTCF fold change and its pvalue, and the seventh column is the percentage of domains enriched in H3K27me3/H3K36me3 marks.

Contact

Feel free to open an issue in Github or contact yuwzhang7-c@my.cityu.edu.hk if you have any problem in using SuperTLD.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supertld-0.0.2.tar.gz (34.0 kB view details)

Uploaded Source

File details

Details for the file supertld-0.0.2.tar.gz.

File metadata

  • Download URL: supertld-0.0.2.tar.gz
  • Upload date:
  • Size: 34.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.28.1 importlib-metadata/4.11.2 keyring/17.0.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.1

File hashes

Hashes for supertld-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9aa62c58957d309b63dcd78793710dc5c2bb005b63b3dfe16e70066c98d8361c
MD5 c61b53ac384ddea668a87b828f113c14
BLAKE2b-256 ff6cd6d6ac976ce661766d01115535ee202abb646861c4e7511cceb40770581f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page