Straw bound with pybind11
Project description
Quick Start Python
Straw is library which allows rapid streaming of contact data from .hic files. To learn more about Hi-C data and 3D genomics, visit https://aidenlab.gitbook.io/juicebox/
Once you've installed the library with pip install hic-straw
, you can import your code with import hicstraw
.
New usage to directly get numpy matrix
The new usage for straw allows you to create objects and retain intermediate variables. This can speed up your code significantly when querying hundreds or thousands of regions for a given chromosome/resolution/normalization.
First we import numpy
and hicstraw
.
import numpy as np
import hicstraw
We then create a Hi-C file object. From this object, we can query genomeID, chromosomes, and resolutions.
hic = hicstraw.HiCFile("HIC001.hic")
print(hic.getChromosomes())
print(hic.getGenomeID())
print(hic.getResolutions())
We can also collect a matrix zoom data object, which is specific to
- specific matrix-type:
observed
(count) oroe
(observed/expected ratio) - chromosome-chromosome pair
- resolution
- normalization
This object retains information for fast future queries. Here's an example that pick the counts from the intrachromosomal region for chr4 with KR normalization at 5kB resolution.
mzd = hic.getMatrixZoomData('4', '4', "observed", "KR", "BP", 5000)
We can get numpy matrices for specific genomic windows by calling:
numpy_matrix = mzd.getRecordsAsMatrix(10000000, 12000000, 10000000, 12000000)
Usage
hic = hicstraw.HiCFile(filepath)
hic.getChromosomes()
hic.getGenomeID()
hic.getResolutions()
mzd = hic.getMatrixZoomData(chrom1, chrom2, data_type, normalization, "BP", resolution)
numpy_matrix = mzd.getRecordsAsMatrix(gr1, gr2, gc1, gc2)
records_list = mzd.getRecords(gr1, gr2, gc1, gc2)
filepath
: path to file (local or URL)
data_type
: 'observed'
(previous default / "main" data) or 'oe'
(observed/expected)
normalization
: NONE
, VC
, VC_SQRT
, KR
, SCALE
, etc.
resolution
: typically 2500000
, 1000000
, 500000
, 100000
, 50000
, 25000
, 10000
, 5000
, etc.
Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read
(i.e. they are not calculated by straw, only read from the file if available)
gr1
: start genomic position along rows
gr2
: end genomic position along rows
gc1
: start genomic position along columns
gc2
: end genomic position along columns
Legacy usage to fetch list of contacts
For example, to fetch a list of all the raw contacts on chrX at 100Kb resolution:
import hicstraw
result = hicstraw.straw('observed', 'NONE', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
To fetch a list of KR normalized contacts for the same region:
import hicstraw
result = hicstraw.straw('observed', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
To query observed/expected KR normalized data:
import hicstraw
result = hicstraw.straw('oe', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
Usage
hicstraw.straw(data_type, normalization, file, region_x, region_y, 'BP', resolution)
data_type
: 'observed'
(previous default / "main" data) or 'oe'
(observed/expected)
normalization
: NONE
, VC
, VC_SQRT
, KR
, SCALE
, etc.
file
: filepath (local or URL)
region_x/y
: provide the chromosome
or utilize the syntax chromosome:start_position:end_position
if using a smaller window within the chromosome
resolution
: typically 2500000
, 1000000
, 500000
, 100000
, 50000
, 25000
, 10000
, 5000
, etc.
Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read
(i.e. they are not calculated by straw, only read from the file if available)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dea65dba0b271453fa624ee7f5e7d3ffd18e08e4905a8c714bfd04648408c52 |
|
MD5 | b1ec364d5216030a42b03bf935da30e9 |
|
BLAKE2b-256 | 0990fa240ee10625db3d81901a1ce60f5302d43c81422db901d0f9931902d7d4 |