A package for measuring reproducibility of ChIA-PET data.
Project description
ChIA-Rep
A Python package for assessing the reproducibility of ChIA-PET datasets.
Methods Overview
Reading in Data
Reading peaks
Read peak interval from peak file and get the peak value from max value within
the interval from the bedgraph file.
Reading loops
Since loops come with a start interval and end interval, we assign an anchor
within each interval for each loop based on the largest value within the
interval. Additionally, each loop is weighted by the anchor intensity of both
the start and end interval.
Deadzones
Remove loops that start and end on the same peak. Or if the loop start is
somehow past the loop end.
example:
loop_start_interval = (0, 10)
loop_end_interval = (5, 8)
peaks are at index 5 and 9
peak value at index 9 > peak value at index 5
Therefore, the loop start anchor is at index 9 and the loop end anchor is at index 5.
We create a "deadzone" from 0 to 10 in the chromosome. When creating graphs to compare two chromosomes, we combine the "deadzones" from each chromosome and ignore loops from either chromosome in the combined deadzones. Therefore, the created graph for each chromosome can be different for each comparison.
Preprocessing
- Filter out all peaks that are smaller than a certain value:
num_peaks
- Find the kept peak ratio from the
base_chrom
and use it for other chromosomes - Filter out loops that are too long (> 1M)
- Filter out a loop if neither anchor overlaps with a kept peak
Graph representation and comparison (Not exactly correct)
- For a non-overlapping window, bin the loops into bins of fixed size
- Create an adjacency matrix for each window, where index (bin1, bin2) contains the value from the loops going from bin1 to bin2
- Convert each adjacency matrix into a probability vector by reading row-by-row
- Compute the Jensen-Shannon divergence and the Earth Mover's Distance (EMD)
between two probability vectors - Transform each value to be between -1 (dissimilar) and 1 (similar)
- Take the weighted average of values from windows in a chromosome
- Take the average of values from chromosomes to produce a genome-wide reproducibility value
Example
Given two ChIA-PET datasets, create adjacency matrices A1 and A2
Adjacency matrix A1
bin1 | bin2 | bin3 | bin4 | |
---|---|---|---|---|
bin1 | 3 | 2 | 0 | 1 |
bin2 | 1 | 5 | 3 | |
bin3 | 10 | 9 | ||
bin4 | 20 |
Adjacency matrix A2
bin1 | bin2 | bin3 | bin4 | |
---|---|---|---|---|
bin1 | 4 | 5 | 1 | 4 |
bin2 | 3 | 2 | 3 | |
bin3 | 7 | 9 | ||
bin4 | 27 |
Probability vectors p_A1 and p_A2
- p_A1 = (0.06, 0.05, 0, 0.02, 0.02, 0.009, 0.06, 0.19, 0.17, 0.37)
- p_A2 = (0.06, 0.08, 0.02, 0.06, 0.05, 0.03, 0.05, 0.11, 0.14, 0.42)
Results
- ChIA-Rep can clearly distinguish between replicates and non-replicates
- Generally, replicates have positive values and non-replicates have negative values
- Can take 0 as a threshold to determine the similarity
Usage
Dependencies:
numpy>=1.17.0
scipy>=1.3.1
pybedgraph>=0.5.40
click>=7.0
Installation:
# Install from github
git clone https://github.com/c0ver/chia_rep.git
pip3 install chia_rep/
# Install from pypi
pip3 install chia_rep
Create Input files
With example/sample_list.txt
containing the following:
LHH0048H
LHH0054H
LHH0084H
LHH0086V
...
and data/
containing bedgraph, peak, and loop files
cd example
python commands.py --help
python commands.py make-pairs --help
python commands.py make-sample-input-file --help
python commands.py make-pairs sample_list.txt pairs.txt
# Assumes (letter case doesn't matter)
# bedgraph file extension: .bedgraph
# peak files extension: .broadpeak
# loop files extension: .cis.be3
# Creates sample_input_file.txt
python commands.py make-sample-input-file sample_list.txt sample_input_file.txt data/
Run script
Example script is included in example/script.py
.
cd example
python script.py --help
# Example usages
python script.py sample_input_file.txt hg38.chrom.sizes pairs.txt 3000000 5000 chr1
python script.py sample_input_file.txt hg38.chrom.sizes pairs.txt 3000000 5000 all
python script.py sample_input_file.txt hg38.chrom.sizes pairs.txt 3000000 5000 chr1 chr2
Testing
pytest # Runs the tests in test/
Documentation
Included in docs/build/html
Contact
Contact Minji (minji.kim@jax.org) for general questions, and report software issues in the Issues page.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ChIA-Rep-3.1.1.tar.gz
.
File metadata
- Download URL: ChIA-Rep-3.1.1.tar.gz
- Upload date:
- Size: 124.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65b69c0ee914828544a92a756610c790804b367ea420666ab36641c85e1c2d50 |
|
MD5 | c6f7e34f5d889bc5a2bd151117d908aa |
|
BLAKE2b-256 | 07f22edadc8666165ed548a6020790e7ffbb74a8682038d07eef5b3952055a1b |