Skip to main content

Straw bound with pybind11

Project description

Quick Start Python

Straw is library which allows rapid streaming of contact data from .hic files. To learn more about Hi-C data and 3D genomics, visit https://aidenlab.gitbook.io/juicebox/

Once you've installed the library with pip install hic-straw, you can import your code with import hicstraw.

New usage to directly get numpy matrix

The new usage for straw allows you to create objects and retain intermediate variables. This can speed up your code significantly when querying hundreds or thousands of regions for a given chromosome/resolution/normalization.

First we import numpy and hicstraw.

import numpy as np
import hicstraw

We then create a Hi-C file object. From this object, we can query genomeID, chromosomes, and resolutions.

hic = hicstraw.HiCFile("HIC001.hic")
print(hic.getChromosomes())
print(hic.getGenomeID())
print(hic.getResolutions())

We can also collect a matrix zoom data object, which is specific to

  • specific matrix-type: observed (count) or oe (observed/expected ratio)
  • chromosome-chromosome pair
  • resolution
  • normalization

This object retains information for fast future queries. Here's an example that pick the counts from the intrachromosomal region for chr4 with KR normalization at 5kB resolution.

mzd = hic.getMatrixZoomData('4', '4', "observed", "KR", "BP", 5000)

We can get numpy matrices for specific genomic windows by calling:

numpy_matrix = mzd.getRecordsAsMatrix(10000000, 12000000, 10000000, 12000000)

Usage

hic = hicstraw.HiCFile(filepath)
hic.getChromosomes()
hic.getGenomeID()
hic.getResolutions()

mzd = hic.getMatrixZoomData(chrom1, chrom2, data_type, normalization, "BP", resolution)

numpy_matrix = mzd.getRecordsAsMatrix(gr1, gr2, gc1, gc2)
records_list = mzd.getRecords(gr1, gr2, gc1, gc2)

filepath: path to file (local or URL)
data_type: 'observed' (previous default / "main" data) or 'oe' (observed/expected)
normalization: NONE, VC, VC_SQRT, KR, SCALE, etc.
resolution: typically 2500000, 1000000, 500000, 100000, 50000, 25000, 10000, 5000, etc.

Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read (i.e. they are not calculated by straw, only read from the file if available)
gr1: start genomic position along rows
gr2: end genomic position along rows
gc1: start genomic position along columns
gc2: end genomic position along columns

Legacy usage to fetch list of contacts

For example, to fetch a list of all the raw contacts on chrX at 100Kb resolution:

import hicstraw
result = hicstraw.straw('observed', 'NONE', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

To fetch a list of KR normalized contacts for the same region:

import hicstraw
result = hicstraw.straw('observed', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

To query observed/expected KR normalized data:

import hicstraw
result = hicstraw.straw('oe', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))

Usage

hicstraw.straw(data_type, normalization, file, region_x, region_y, 'BP', resolution)

data_type: 'observed' (previous default / "main" data) or 'oe' (observed/expected)
normalization: NONE, VC, VC_SQRT, KR, SCALE, etc.
file: filepath (local or URL)
region_x/y: provide the chromosome or utilize the syntax chromosome:start_position:end_position if using a smaller window within the chromosome
resolution: typically 2500000, 1000000, 500000, 100000, 50000, 25000, 10000, 5000, etc.

Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read (i.e. they are not calculated by straw, only read from the file if available)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hic-straw-1.3.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl (124.6 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file hic-straw-1.3.1.tar.gz.

File metadata

  • Download URL: hic-straw-1.3.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for hic-straw-1.3.1.tar.gz
Algorithm Hash digest
SHA256 fb0f878127f6b1d096303c67793477c83fddf3f4a1a8e29a9d92952634989876
MD5 e7069201927daecd77354fd71e2bb35d
BLAKE2b-256 8eec431c76970f8973ea5937a9b5f2d1689a641b3fe6475246a32451274fa2dd

See more details on using hashes here.

File details

Details for the file hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 124.6 kB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7dea65dba0b271453fa624ee7f5e7d3ffd18e08e4905a8c714bfd04648408c52
MD5 b1ec364d5216030a42b03bf935da30e9
BLAKE2b-256 0990fa240ee10625db3d81901a1ce60f5302d43c81422db901d0f9931902d7d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page