Process BBI (bigWig/bigBed) and HiC files
Project description
Installation
pip install gwseq-io
Requires numpy and pybind11.
Usage
Open bigWig, bigBed and HiC files
reader = gwseq_io.open(path, *, parallel, zoom_correction)
Parameters:
parallelNumber of parallel file handles and processing threads. 24 by default.zoom_correctionScaling factor for automatic zoom level selection based on bin size. Only for bigWig files. 1/3 by default.
Attributes for bigWig and bigBed files:
main_headerGeneral file formatting info.zoom_headersZooms levels info (reduction level and location).auto_sqlBED entries declaration (only in bigBed).total_summaryStatistical summary of entire file values (coverage, sums and extremes).chr_sizesChromosomes IDs and sizes.typeEither "bigwig" or "bigbed".
Attributes for HiC files:
headerfooterGeneral file info.chr_sizesChromosomes IDs and sizes.normalizationsAvailable normalizations.unitsAvailable units.bin_sizesAvailable bin sizes.
Read bigWig and bigBed signal
values = reader.read_signal(chr_ids, starts, ends)
values = reader.read_signal(chr_ids, starts=starts, span=span)
values = reader.read_signal(chr_ids, ends=ends, span=span)
values = reader.read_signal(chr_ids, centers=centers, span=span)
Parameters:
chr_idsstartsendscentersChromosomes ids, starts, ends and centers of locations. Bothstartsendsor one ofstartsendscenters(withspan) may be specified.spanReading window in bp relative to locationsstartsendscenters. Only one reference may be specified if specified. Not by default.bin_sizeReading bin size in bp. May vary in output if locations have variable spans orbin_countis specified. 1 by default.bin_countOutput bin count. Inferred as max location span / bin size by default.bin_modeMethod to aggregate bin values. Either "mean", "sum" or "count". "mean" by default.full_binExtend locations ends to overlapping bins if true. Not by default.def_valueDefault value to use when no data overlap a bin. 0 by default.zoomBigWig zoom level to use. Use full data if -1. Auto-detect the best level if 0 by selecting the larger level whose bin size is lower than the third ofbin_size(may be the full data). Full data by default.progressFunction called during data extraction. Takes the extracted coverage and the total coverage in bp as parameters. Use default callback function if true. None by default.
Returns a numpy float32 array of shape (locations, bin count).
Quantify bigWig and bigBed signal
values = reader.quantify(chr_ids, starts, ends)
Parameters:
chr_idsstartsendscentersspanbin_sizefull_bindef_valuezoomprogressIdentical toread_signalmethod.reduceMethod to aggregate values over span. Either "mean", "sd", "sem", "sum", "count", "min" or "max". "mean" by default.
Returns a numpy float32 array of shape (locations).
Profile bigWig and bigBed signal
values = reader.profile(chr_ids, starts, ends)
Parameters:
chr_idsstartsendscentersspanbin_sizebin_countbin_modefull_bindef_valuezoomprogressIdentical toread_signalmethod.reduceMethod to aggregate values over locations. Either "mean", "sd", "sem", "sum", "count", "min" or "max". "mean" by default.
Returns a numpy float32 array of shape (bin count).
Read bigBed entries
values = reader.read_entries(chr_ids, starts, ends)
Parameters:
chr_idsstartsendscentersspansprogressIdentical toread_signalmethod.
Returns a list (locations) of list of entries (dict with at least "chr", "start" and "end" keys).
Convert bigWig to bedGraph or WIG
reader.to_bedgraph(output_path)
reader.to_wig(output_path)
Parameters:
output_pathPath to output file.chr_idsOnly extract data from these chromosomes. All by default.zoomZoom level to use. Full data by default.progressFunction called during data extraction. Takes the extracted coverage and the total coverage in bp as parameters. None by default.
Convert bigBed to BED
reader.to_bed(output_path)
Parameters:
output_pathchr_idsprogressIdentical toto_bedgraphandto_wigmethods.col_countOnly write this number of columns (eg, 3 for chr, start and end). All by default.
Write bigWig file
writer = bigwig_io.open(path, "w")
writer = bigwig_io.open(path, "w", def_value=0)
writer = bigwig_io.open(path, "w", chr_sizes={"chr1": 1234, "chr2": 1234})
writer.add_entry("chr1", start=1000, end=1010, value=0.1)
writer.add_value("chr1", start=1000, span=10, value=0.1)
writer.add_values("chr1", start=1000, span=10, values=[0.1, 0.1, 0.1, 0.1])
must be pooled by chr, and sorted by (1) start (2) end no overlap
Write bigBed file
writer = bigwig_io.open(path, "w", type="bigbed")
writer = bigwig_io.open(path, "w", type="bigbed", chr_sizes={"chr1": 1234, "chr2": 1234})
writer = bigwig_io.open(path, "w", type="bigbed", fields=["chr", "start", "end", "name"])
writer = bigwig_io.open(path, "w", type="bigbed", fields={"chr": "string", "start", "uint", "end": "uint", "name": "string"})
writer.add_entry("chr1", start=1000, end=1010)
writer.add_entry("chr1", start=1000, end=1010, fields={"name": "read#1"})
must be pooled by chr, and sorted by (1) start (2) end may be overlapping
Read HiC signal
values = reader.read_signal(chr_ids, starts, ends)
Parameters:
chr_idsstartsendsChromosomes ids, starts and ends of the 2 locations.bin_sizeInput bin size or -1 to use the smallest. Must be available in the file. Smallest by default.bin_countApproximate output bin count. Takes precedence overbin_sizeif specified by selecting the closest bin size resulting inbin_count. Not specified by default.exact_bin_countResize output to matchbin_count(if specified). Not by default.full_binExtend locations ends to overlapping bins if true. Not by default.def_valueDefault value to use when no data overlap a bin. 0 by default.triangleSkip symmetrical data if true. Not by default.min_distancemax_distanceMin and max distance in bp from diagonal for contacts to be reported. All by default.normalizationEither "none" or any normalization available in the file, such as "kr", "vc" or "vc_sqrt". "none" by default.modeEither "observed" or "oe" (observed/expected). "observed" by default.unitEither "bp" or "frag". "bp" by default.save_toSave output to this .npz path (under "values" key) and return nothing. Not by default.
Returns a numpy float32 array of shape (loc 1 bins, loc 2 bins).
Read HiC sparse signal
values = reader.read_sparse_signal(chr_ids, starts, ends)
Parameters:
chr_idsstartsendsbin_sizebin_countexact_bin_countfull_bindef_valuetrianglemin_distancemax_distancenormalizationmodeunitsave_toIdentical toread_signalmethod.
Returns a COO sparse matrix as a dict with keys:
valuesValues as a numpy float32 array.rowValues rows indices as a numpy uint32 array.colValues columns indices as a numpy uint32 array.shapeShape of the dense array as a tuple.
Convert in python using scipy.sparse.csr_array((x["values"], (x["row"], x["col"])), shape=x["shape"]).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gwseq_io-0.0.6.tar.gz.
File metadata
- Download URL: gwseq_io-0.0.6.tar.gz
- Upload date:
- Size: 60.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
533f5e85831c4a34a94a820852cea7c9c8f9ec47e66fd790e4ea4b1307d21bd6
|
|
| MD5 |
9d34e3bd61ab3a8c8a9ba23684f28400
|
|
| BLAKE2b-256 |
8e5df5e9ca03870ce9ea283ca8487f44f31ad85f24b904a977675af11d403b10
|
File details
Details for the file gwseq_io-0.0.6-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: gwseq_io-0.0.6-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 500.2 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8d788fbbe09ccf09ae649047f201f7b79b1457f6f5ea79c34f711d029fa8768
|
|
| MD5 |
525f28437fe5529dfc1edf0e22b27801
|
|
| BLAKE2b-256 |
edfc7ec3c71283ea654776c5c4307d6d5c0c5ed3eb6801bf92990a10c478ff50
|