Compartmental Refinement for Ultraprecise Stratification in Hi-C — A/B chromatin compartment analysis tool
Project description
CRUSH — Compartmental Refinement for Ultraprecise Stratification in Hi-C
CRUSH (Compartmental Refinement for Ultraprecise Stratification within Hi-C) is a command-line tool that identifies fine-scale A/B chromatin compartments from Hi-C contact matrices. It has successfully identified compartments in Hi-C, Micro-C, and Single-Cell Hi-C data, and specializes in calling compartments at high resolutions with significantly lower read depth than other compartment calling tools.
Manuscript in preparation — JRowleyLab, PI: Jordan Rowley
Table of Contents
- How It Works
- Installation
- Quick Start
- Input Files
- Output Files
- Key Parameters
- Test Dataset
- Dependencies
- Citation
- Contact
How It Works
At its core, CRUSH asks a simple question for every genomic bin: does this bin interact more with A-type regions (iA) or B-type regions (iB)?
The algorithm walks from coarse resolutions down to your target resolution, using each level to refine A/B compartment assignments at the next finer level:
- Eigenvector initialization — Computes principal components of the Hi-C contact matrix (or accepts a user-supplied eigenvector) to define initial A (iA) and B (iB) states.
- CRUSH score calculation — At each resolution, calculates a Genome Interaction (GI) score per bin reflecting how much more it contacts iA regions versus iB regions.
- Compartment reclassification — After each resolution pass, A/B bin assignments are updated based on the new scores, then used to seed the next finer resolution.
- Resolution walking with midpoint shifting — A rolling-window alignment step adjusts finer-resolution scores against the coarser baseline, removing systematic biases between resolution levels.
- Statistical filtering — Applies Benjamini–Hochberg FDR correction and outputs a q-value filtered bedGraph.
A compartments → positive CRUSH score (gene-rich, open chromatin, active transcription)
B compartments → negative CRUSH score (gene-poor, closed chromatin, transcriptionally silent)
Unlike eigenvector-based methods, you never need to flip CRUSH scores — A is always positive and B is always negative.
Installation
pip install CRUSH-hic
We recommend setting up a dedicated conda environment:
conda create -n crush_env python=3.10
conda activate crush_env
conda install -c bioconda bedtools
pip install CRUSH-hic hic-straw cooler numpy scipy pandas statsmodels tqdm
Dependencies
| Tool | Purpose | Install |
|---|---|---|
| Python ≥ 3.8 | Runtime | python.org |
| bedtools | Genomic intersections | conda install -c bioconda bedtools |
| mawk | Fast text processing | sudo apt install mawk / brew install mawk |
| hic-straw | Read .hic files |
pip install hic-straw |
| cooler | Read .mcool files |
pip install cooler |
| numpy / scipy / pandas | Numerical computing | pip install numpy scipy pandas |
| statsmodels | FDR correction | pip install statsmodels |
| tqdm | Progress bars | pip install tqdm |
Verify installation
crush --help
Quick Start
With genome build shortcut (supported builds: hg19, hg38, mm10, mm9; res ≥ 500 bp)
crush \
-i data.hic \
-gb hg38 \
-r 10000 \
-c 8 \
-o output_prefix_
With manual reference files (any genome, any resolution)
crush \
-i data.hic \
-g hg38.sizes \
-a hg38_genes.bed \
-b hg38.fa \
-r 10000 \
-c 8 \
-o output_prefix_
Chromosome naming: CRUSH automatically detects and converts chromosome prefix mismatches between your Hi-C file and reference files (e.g.,
chr1vs1). If output is empty or unexpected, verify that your Hi-C file itself uses a consistent naming convention throughout.
Input Files
Always required
| Flag | Description |
|---|---|
-i |
Hi-C file (.hic from Juicer or .mcool from cooler). Local path or HTTPS URL. |
-r |
Target resolution in base pairs (e.g., 10000 for 10 kb). Must exist in your Hi-C file. |
Reference files — choose one of two paths
PATH A — genome build shortcut (res ≥ 500 bp only)
| Flag | Description |
|---|---|
-gb |
Genome build shortcut. Supported builds: hg19, hg38, mm10, mm9. Auto-downloads chr.sizes, genes.bed, and Bbins.bed from JRowleyLab GitHub. Not available for res < 500 bp because the hosted Bbins.bed was pre-computed at 500 bp — for sub-500 bp analysis supply -g, -a, and -b (FASTA) manually so CRUSH can recompute Bbins at your exact resolution. Explicit -g/-a/-b flags override the auto-download for that specific file. |
PATH B — manual reference files (any genome, any resolution)
| Flag | Description |
|---|---|
-g |
Chromosome sizes file — two tab-separated columns: chr_name and size (bp). No header. |
-a |
BED file (≥ 3 columns) for A-compartment initialization. Gene annotations work well. ChIP-seq peaks for an active histone mark (e.g., H3K27ac) also work. |
-b |
Genome FASTA or pre-computed Bbins BED for B-compartment initialization. With FASTA, CRUSH generates Bbins at 500 bp (res ≥ 500 bp) or at the input resolution (res < 500 bp). With BED, the file is used directly as B-compartment seeds. |
Optional
| Flag | Description |
|---|---|
-e |
Pre-computed eigenvector bedGraph (4 columns: chr, start, end, value). Positive = A, Negative = B. Skips automatic eigenvector calculation. |
Output Files
CRUSH produces four output files, each prefixed with whatever you supply via -o:
| File | Description |
|---|---|
{prefix}CRUSHparamters.txt |
Record of all parameters used. Keep this for reproducibility. |
{prefix}mergedCrush_{res}.bedgraph |
Main output. CRUSH scores for every bin. Positive = A compartment, Negative = B compartment. Unlike eigenvectors, scores never need to be flipped. |
{prefix}mergedqvalue_{res}.bedgraph |
Estimated q-value (BH-corrected) for each bin's score. |
{prefix}mergedCrush_{res}_qfiltered_reprocess.bedgraph |
CRUSH scores filtered to bins passing the q-value threshold. Note: this filter can be overly stringent — excellent results are often obtained from the unfiltered mergedCrush file. |
All bedGraph files include a UCSC track header for direct loading into genome browsers (IGV, UCSC, WashU).
While running, CRUSH creates a temporary working directory named CRUSHtmp_[randomnumber] in your current directory. This is removed automatically when the run completes. To keep it (e.g., for debugging), use -C 0. You can also name it yourself with -f.
Key Parameters
| Flag | Default | Description |
|---|---|---|
-c |
1 |
Number of CPU threads. Set to number of chromosomes or available cores, whichever is smaller. |
-gb |
(none) | Genome build shortcut (hg19, hg38, mm10, mm9). Auto-downloads reference files. res ≥ 500 bp only. |
-o |
(none) | Output file prefix. |
-N |
NONE |
Normalization: NONE, VC, VC_SQRT, KR, SCALE. |
-m |
2500000 |
Coarsest resolution to start walking from. |
-Z |
100000 |
Resolution for eigenvector calculation (100 kb recommended). |
-w |
5 |
Sliding window size (kb) for score averaging. Set to 1 to disable. Set to 0 for legacy auto-calculation from sequencing depth. |
-q |
0.05 |
Q-value threshold for filtered output. Set to 0 to disable filtering. |
-s |
0 |
Enable boundary smoothing (1 = on). |
-A |
0 |
Adjust score distribution. Do not use when comparing samples. |
-C |
1 |
Clean up temp files after run (0 = keep). |
-v |
0 |
Verbose output (1 = on). |
For the complete parameter reference, see the User Manual.
Test Dataset
A small test dataset covering chromosomes 17–19 of hg19 is provided in examples/TestData/:
| File | Description |
|---|---|
hg19_c17_18_19_1kb.hic.gz |
Hi-C contact file |
hg19_c17_18_19_genes.bed.gz |
Gene annotations for A-state initialization |
hg19_c17_18.fa.gz |
Genome FASTA for GC-based B-state initialization |
hg19_c17_18.fa.fai |
FASTA index |
hg19_c17_18_19.sizes.gz |
Chromosome sizes |
Eigen_100kb_c17_18_19.bedgraph.gz |
Pre-computed eigenvector (optional -e input) |
Bbins_hg19_c17_18_19.bed.gz |
Pre-computed B-bins (alternative to FASTA for -b) |
Run the test
# Decompress
gunzip examples/TestData/*.gz
# Run with FASTA-based B initialization
crush \
-i examples/TestData/hg19_c17_18_19_1kb.hic \
-g examples/TestData/hg19_c17_18_19.sizes \
-a examples/TestData/hg19_c17_18_19_genes.bed \
-b examples/TestData/hg19_c17_18.fa \
-r 10000 \
-c 4 \
-o test_
Expected output: test_mergedCrush_10000.bedgraph, test_mergedqvalue_10000.bedgraph, and test_mergedCrush_10000_qfiltered_reprocess.bedgraph.
Load test_mergedCrush_10000.bedgraph into IGV or the UCSC browser to verify the A/B compartment pattern on chr17–19.
Citation
Manuscript in preparation. If you use CRUSH in your research, please check back for the citation or contact us directly.
Contact
JRowleyLab | PI: Jordan Rowley
For questions, bug reports, or feature requests, please open a GitHub Issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crush_hic-1.0.0.tar.gz.
File metadata
- Download URL: crush_hic-1.0.0.tar.gz
- Upload date:
- Size: 40.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
866e8f645a871f5cbe0205fc6a9ce5484e65675f1f29012be81eb2a22b840cff
|
|
| MD5 |
933baf46aae6289966ede36c25302479
|
|
| BLAKE2b-256 |
21de8cc1089fcc4f1c2cdb0a8c75d339afabb4797fca73ab6807765fc43a2001
|
File details
Details for the file crush_hic-1.0.0-py3-none-any.whl.
File metadata
- Download URL: crush_hic-1.0.0-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87fb415ef4179359b73d0d78c824d9162814d031801303ff4df8e3067ed16cc1
|
|
| MD5 |
15eeb464d36006d706fdc5958d5afb1a
|
|
| BLAKE2b-256 |
139b8c2c9e74efc13b8fa2cbca4b61a9601217642323582a53dd0805b6f053d3
|