Skip to main content

Straw bound with pybind11

Project description

Quick Start Python

Straw is library which allows rapid streaming of contact data from .hic files. To learn more about Hi-C data and 3D genomics, visit https://aidenlab.gitbook.io/juicebox/

Once you've installed the library with pip install hic-straw, you can import your code with import hicstraw.

New usage to directly get numpy matrix

The new usage for straw allows you to create objects and retain intermediate variables. This can speed up your code significantly when querying hundreds or thousands of regions for a given chromosome/resolution/normalization.

First we import numpy and hicstraw.

import numpy as np
import hicstraw

We then create a Hi-C file object. From this object, we can query genomeID, chromosomes, and resolutions.

hic = hicstraw.HiCFile("HIC001.hic")
print(hic.getChromosomes())
print(hic.getGenomeID())
print(hic.getResolutions())

We can also collect a matrix zoom data object, which is specific to

  • specific matrix-type: observed (count) or oe (observed/expected ratio)
  • chromosome-chromosome pair
  • resolution
  • normalization

This object retains information for fast future queries. Here's an example that pick the counts from the intrachromosomal region for chr4 with KR normalization at 5kB resolution.

mzd = hic.getMatrixZoomData('4', '4', "observed", "KR", "BP", 5000)

We can get numpy matrices for specific genomic windows by calling:

numpy_matrix = mzd.getRecordsAsMatrix(10000000, 12000000, 10000000, 12000000)

Usage

hic = hicstraw.HiCFile(filepath)
hic.getChromosomes()
hic.getGenomeID()
hic.getResolutions()

mzd = hic.getMatrixZoomData(chrom1, chrom2, data_type, normalization, "BP", resolution)

numpy_matrix = mzd.getRecordsAsMatrix(gr1, gr2, gc1, gc2)
records_list = mzd.getRecords(gr1, gr2, gc1, gc2)

filepath: path to file (local or URL)
data_type: 'observed' (previous default / "main" data) or 'oe' (observed/expected)
normalization: NONE, VC, VC_SQRT, KR, SCALE, etc.
resolution: typically 2500000, 1000000, 500000, 100000, 50000, 25000, 10000, 5000, etc.

Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read (i.e. they are not calculated by straw, only read from the file if available)
gr1: start genomic position along rows
gr2: end genomic position along rows
gc1: start genomic position along columns
gc2: end genomic position along columns

Legacy usage to fetch list of contacts

For example, to fetch a list of all the raw contacts on chrX at 100Kb resolution:

import hicstraw
result = hicstraw.straw('observed', 'NONE', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

To fetch a list of KR normalized contacts for the same region:

import hicstraw
result = hicstraw.straw('observed', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

To query observed/expected KR normalized data:

import hicstraw
result = hicstraw.straw('oe', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

Usage

hicstraw.straw(data_type, normalization, file, region_x, region_y, 'BP', resolution)

data_type: 'observed' (previous default / "main" data) or 'oe' (observed/expected)
normalization: NONE, VC, VC_SQRT, KR, SCALE, etc.
file: filepath (local or URL)
region_x/y: provide the chromosome or utilize the syntax chromosome:start_position:end_position if using a smaller window within the chromosome
resolution: typically 2500000, 1000000, 500000, 100000, 50000, 25000, 10000, 5000, etc.

Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read (i.e. they are not calculated by straw, only read from the file if available)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hic-straw-1.3.1.tar.gz (18.1 kB view hashes)

Uploaded Source

Built Distribution

hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl (124.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page