Skip to main content

A computational pipeline to identify differential chromatin contacts from single cell Hi-C data

Project description

SnapHiC-D

Identifying differential chromatin interactions from single cell Hi-C data

Find the preprint here.

SnapHiC-D is an extension of SnapHiC and requires SnapHiC's RWR step output for its input. For a faster version, use SnapHiC2, which is enabled by selecting "method="sliding_window".

Install SnapHiC-D

Install SnapHiC-D through pip:

conda create --name SnapHiC_D_env python==3.6.8
conda activate SnapHiC_D_env
pip install SnapHiC-D

Requirements

SnapHiC-D was built using following Python packages.

  1. Python 3.6.8
  2. numpy 1.19.0
  3. pandas 1.1.5
  4. qnorm 0.8.1 (https://github.com/Maarten-vd-Sande/qnorm)
  5. scipy 1.5.4
  6. statsmodels 0.12.2
  7. futures 3.0.5
  8. click 7.1.2

Running SnapHiC-D

Activate the python environment with SnapHiC-D installed and enter the following in the terminal:

SnapHiC-D diff-loops -i group_A_dir -j group_B_dir -o out_dir -c chr -n num_CPUs\
                     -b genome_region_path -g genome_transcript_path\
                     --binsize bin_size --fdr_threshold fdr_threshold\
                     --mini_gap min_gap --maxi_gap max_gap

The required inputs variables are:

  1. group_A_dir : The directory of files for group A
  2. group_B_dir : The directory of files for group B
  3. out_dir : The output directory
  4. chr : chromosome number (i.e. chr3)
  5. num_CPUs : The number of CPUs one would like to use. One can check how many CPUs are available by "lscpu". If num_CPUs = 1, the program will run as a single processor. When using a HPC with job scheduler, make sure to ask for 1 node.
  6. genome_region_path: the path of mm10_filter_regions.txt or hg19_filter_regions.txt, depending on the reference genome. These files are provided in the ext folder.
  7. genome_transcript_path: the path of mm10.refGene.transcript.TSS.061421.txt or hg19.refGene.transcript.TSS.061421.txt, depending on the reference genome. These files are provided in the ext folder.
  8. bin_size : The resolution of bin size
  9. fdr_threshold : FDR threshhold; the default value is 0.1
  10. min_gap : The minimum distance gap; the default value is 2 (2kb)
  11. max_gap : The maximum distance gap; the default value is 101 (1MB)

We have provided input example data of 94 mouse embryonic stem cells (mESC) and 188 mouse neuron progenitor cells (NPCs) in zipped folders to test SnapHiC-D. These are the trimmed RWR results from SnapHiC around the 200Kb region of Sox2 locus - chr3:34,601,000–34,806,000 (ref: mm10). To run SnapHiC-D, type

SnapHiC-D diff-loops -i group_A_dir -j group_B_dir -o output -c chr3 -n 2\
                     -b "ext/mm10_filter_regions.txt"\
                     -g "ext/mm10.refGene.transcript.TSS.061421.txt"

A directory named output will be created with the following files inside:

  1. output/combined_results_chr3.txt: T-test results of bin pairs.
  2. output/DI_FDR0.1_T2_Test_chr3.txt: filtered results based on FDR and the T statistic.

Contact Us

For any questions regarding this software, contact Ming Hu (hum@ccf.org), Lindsay Lee (leeh7@ccf.org), or Hongyu Yu (hongyuyu@unc.edu).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SnapHiC-D-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distributions

SnapHiC_D-0.1.0-py3.10.egg (9.8 kB view details)

Uploaded Source

SnapHiC_D-0.1.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file SnapHiC-D-0.1.0.tar.gz.

File metadata

  • Download URL: SnapHiC-D-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for SnapHiC-D-0.1.0.tar.gz
Algorithm Hash digest
SHA256 65d990ada138557a05233393a85b2f723a71c0509e899eac7b5a5310d277512b
MD5 22adc11113be1290ccffeff3bec573eb
BLAKE2b-256 8c8f67a68892bef348b1d49eae0834eafffced7cc1a05b51851491019a9dc778

See more details on using hashes here.

File details

Details for the file SnapHiC_D-0.1.0-py3.10.egg.

File metadata

  • Download URL: SnapHiC_D-0.1.0-py3.10.egg
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for SnapHiC_D-0.1.0-py3.10.egg
Algorithm Hash digest
SHA256 de056023f18bcff5a8cbda4ae9cea00a1ce88383cf003f58eba661c37d836c1b
MD5 ad13f9da1cbb7e406b5ca20e4e79ee63
BLAKE2b-256 cb7b496d3b63984fc2353b20dd0bffd29805445942d9856df861125ef8bc4176

See more details on using hashes here.

File details

Details for the file SnapHiC_D-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: SnapHiC_D-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for SnapHiC_D-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec0dd045e073db6797a7ad32c37e5f05bb876dd3dcdcade1437d72f92e33d632
MD5 4d716f289b3f1604252b6e1bc32adbd7
BLAKE2b-256 21e5fd61bf39922074b4cb05919654438d3d4c6f3193a4b6436ce1cfbaca1181

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page