Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.4.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a612dd747eb2aea9d25d0f95671e81c8d9b8784a12608815e418fc4f64355197 |
|
MD5 | 9646134e2d7afba08b6bf1a7d2ca9cce |
|
BLAKE2b-256 | b66221a5d526c013940312249ea6e7067d35b30015771a1dfb1770adbaaea5df |
Hashes for bgen-1.4.1-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d151c73be4448f5b07abb06809f1ba0c7acbe9b9e063f1e6c4f6733dc65f1a |
|
MD5 | c31459ac591d2b905d44a7d44e9a191c |
|
BLAKE2b-256 | 7e8919642e4bceb0b31e0d0e2dfc4f3560682c9bc2c15ec15bed1880c90aab78 |
Hashes for bgen-1.4.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c9ab35a70f5159d98822202e686ae6792a602593bfac7dddb8ab33a3c0cd901 |
|
MD5 | 5a90b95ce08fef659af74bf1a0c07c2b |
|
BLAKE2b-256 | 850d3b6de4d468393b646f87264c64e340dbc9c10f5d6f652464bbda621ec8a3 |
Hashes for bgen-1.4.1-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | faf960399d6a99cc0e219f942a2fb1698b1ff2d7c8609d9a53f9c4dc0aafb538 |
|
MD5 | 685a96a22ddedcfcfd2240fb4ac700c8 |
|
BLAKE2b-256 | 800aa2e7c18c6892ab66494a6dea7974a4cf1d87bca1f292e17f7488653f56d5 |
Hashes for bgen-1.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7201b0325207229804cd645a2dfebb1a067a4ef641ba1bfa8d32340c43da32f |
|
MD5 | 8caaab8205f985ed2a92ae9b8c5a6c64 |
|
BLAKE2b-256 | f64bdd62fd800a154a87dc4411b80539a1860c0ccb7bdf6451fb116f0257170a |
Hashes for bgen-1.4.1-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01c1efb20baaffd6d4ef4eaa47a0a9773bb157b2ba4d300c65ea25e1157f2102 |
|
MD5 | 8c4fb5e1d019b165e7bb0249e21a0c2b |
|
BLAKE2b-256 | f77553776bb3357892377b10e76a45cb8ef2bbc2b83e4d4ca007af4015ad0e2d |
Hashes for bgen-1.4.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 950ea7e9cd142e91443675092ea08b170f630f5919910e3a56464fd439185540 |
|
MD5 | 3d7715c8a8edc196b443eea0b29dcb95 |
|
BLAKE2b-256 | fb0c73eb4ab763025f5190a374f33e38b548f9888965c998141bf6dbe9e4016c |
Hashes for bgen-1.4.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | becb4f1b847e688d2eb24e8af68230e5adc00c1bd0ff8669fced10d386de0433 |
|
MD5 | b9aa23f4c5f6506bcd907a09bdac31f6 |
|
BLAKE2b-256 | 2a2a70eadc448698d026754d816f236e93e8de7988543b991c2e58902e225d9d |
Hashes for bgen-1.4.1-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c8cf695cdbe91ef9fe11f350a2d7d9dc04f340cbe85e5eae28291c418023a21 |
|
MD5 | 718bfbc3e00fc387580d775e4518daf5 |
|
BLAKE2b-256 | e1c5c9cb78e1ebe08096a28d52e60f6937ec59e51b5b0dca5978753670668384 |
Hashes for bgen-1.4.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8469b25319f56233c89edfd7ef92466ae89c7e71385baef2624e47685e25a41 |
|
MD5 | e15f19c0aab7a164e0a862d9731524c2 |
|
BLAKE2b-256 | 513205aca1afafc31f2c035d90d3d500d3dd7408caf57818f19b733ca3562ee4 |
Hashes for bgen-1.4.1-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 758a040196248446eb552d4f91e4477819cb56ae527449424aa8008857034369 |
|
MD5 | 17139864d329d6b8081057719105cbfc |
|
BLAKE2b-256 | 0e67c0bc9dd4cdd8889aa481716d992f00ce223b8eea7e41bfd6704be25a8bc4 |
Hashes for bgen-1.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 644206bf2abbfabf43902e15b3f0978c05980cfe2161d55d8ead3464cd0ea6ff |
|
MD5 | b6df2fe5a77a57012763523acb3cd82f |
|
BLAKE2b-256 | f7f5be0e7fcda2c530cf3362423c9dda41bffb6edcfbdbac44bed47a690bce7b |
Hashes for bgen-1.4.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936819d1964f35e0ad36f262098332b1d4d1666741e3f375418c6c03c02f6359 |
|
MD5 | 46fff3c3bcd2679b7e64323ced0088e2 |
|
BLAKE2b-256 | 1c312658178a56327e5743e8f4a45dd5c4fd2a8b9349782a2338ac78f915eb21 |
Hashes for bgen-1.4.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4d7746a4ad1ab4891d878d72d741fd52e93651fca1954e6529cb8379986f137 |
|
MD5 | 6904cf88b29a0b3ac7cc60a638ec344f |
|
BLAKE2b-256 | cebde94198927cd49ae77ba7336d3972798e33cfb00cab9cf7aab42ad699494a |
Hashes for bgen-1.4.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 337b080ab44a8572fb2c599118ddae1aeaa763fbf1afa508d91fd76b6d5fbf7c |
|
MD5 | f483fd5e9cf3ec3b98f72b56c779d1fb |
|
BLAKE2b-256 | b32cb0ecc00a96ce5b10fcb98b73daba90f5f7ed2989ecf9fa401d7a8377ea9d |
Hashes for bgen-1.4.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 472ae8e513ad95a7d2d4b33a7767813f67cc60ddaf906b3a94a7f4779a0822f6 |
|
MD5 | 5755978545d0b74976a4db27a0e71aeb |
|
BLAKE2b-256 | b12257d956e8e5079f4dd49fd15f7757f740442c89795fe55c64e8e593b6e095 |
Hashes for bgen-1.4.1-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a8aa31ae70741f5372470c79c4f70aa722284d2be9930f20fea3114dae3b730 |
|
MD5 | f9617e976d8f09b217b6b88617946120 |
|
BLAKE2b-256 | dc9308bad10aa229f209ff9e0bfaf06a26bd24946b4b7a7ec7a0a68204c2bf30 |
Hashes for bgen-1.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ac7a631dfafd3cd4eb042ceddaf63e44dbedd6d21c3eae0f9a83017a5f07a82 |
|
MD5 | 3bcdebdd068250d3d7835089103f83bd |
|
BLAKE2b-256 | a249505ca59fdf74fb44b52db8a0c3a4fef5beee86548b819457097fcb66dc83 |
Hashes for bgen-1.4.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26e4dc9608390db7b12c4c0218c6bf22ee466773d897417681c4eb7a794b8db0 |
|
MD5 | c82d4bc97c0828ca65887ec72b0b1e0e |
|
BLAKE2b-256 | f19c3fac1de3c9a4768add00fc10a505b45be22e385ab2215d2afad1d465690a |
Hashes for bgen-1.4.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e09103e7c278d6a2764d1c6ef8d6b5e5655cadfbd8d4778c2d4ce831eeb0a9bd |
|
MD5 | 48bca32d7dedbac0621964bad6283374 |
|
BLAKE2b-256 | 100abdc537379c1a8e3a6610fe9a80dd9ae7189329ea31f18ec989038dc4b1f5 |
Hashes for bgen-1.4.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c99c183fc610336152ac76a686518accd47f3eb881c799630d668cdbc4ebe502 |
|
MD5 | 32023ea3a561473189c22606ef276f4e |
|
BLAKE2b-256 | b438752ebf8995bc60bea652d54189ad25d3d548c906623109e09684b77b7f67 |
Hashes for bgen-1.4.1-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dbd2a9f9b89d1553a0d04a0e3ab5d790a9f2e8c25f87a6c8f03a81db4ae9b27 |
|
MD5 | 4a920ab77902b700b81b9d7850a0d22a |
|
BLAKE2b-256 | f93807390ff40e5c47a4143c1345a594186d24a33e13f8caa54b72839eb053a6 |
Hashes for bgen-1.4.1-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0e3b7e159339fd641a8dc7060381e86d81501fe71b40b0c79b4087d10c4e5dd |
|
MD5 | 0d064e7ba1a261a0d02842a38ec31493 |
|
BLAKE2b-256 | 88d0f11f9a4a8b70b909ec584cb95548d3a110b173517a5707cdc65e6dcbd7ce |
Hashes for bgen-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 875afd491cf5fe7196a762e4962c2d50366411b0ca83e224116bab2932b3d6da |
|
MD5 | a02b88a8074338aee943e6524c0f18f9 |
|
BLAKE2b-256 | f96137a24927795b938c601f05baca0d54f6350d276868198da935cc510674cc |
Hashes for bgen-1.4.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cd8038be22e36c4e6438772c007bd0fd4b84329d50d6eece7c9ca025bdafe9b |
|
MD5 | ca56c37c16ee3416b128a70d4115b02b |
|
BLAKE2b-256 | c2981c1514534bad2121e3409c1b5772365250c06f1c7ab7ba02e49ec7bbe6c3 |
Hashes for bgen-1.4.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c70a238e67ce71f30b1acc585e68dbb15481577976b2cabe857ad6c2d7e1dcd1 |
|
MD5 | 244bda60fdae61774a708e4ccb3473e1 |
|
BLAKE2b-256 | a0fb88f9560adbf492c9433c8db6cb6dc069a1dc24eb2d4ef96e7ead90e60c5d |
Hashes for bgen-1.4.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdac5649f9ab5d7764a98663c81b2fca5edccfafd3d8cc160d83b75821b11662 |
|
MD5 | d65e1e48cc216bc681c803219d87a1d8 |
|
BLAKE2b-256 | c83aca4f129edbd8eb224affd108ac6cfbf756707040d73218cae95fcc8babeb |
Hashes for bgen-1.4.1-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2b25e3b2af9bcfda1b06c72887165d50809feb13ab27acf65d14c8d70bad566 |
|
MD5 | f36423b0138e27f935fc0878a5339e36 |
|
BLAKE2b-256 | d37365a6b4ee192893c6b66bacd2b0b22cd684b21903595aa001deacdbee6d29 |
Hashes for bgen-1.4.1-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 891a6d36bdc07d096fac980067e46f420b47cdb794c63867e3e03f1b29bccfee |
|
MD5 | ce97727fa3d073b1aa8dd51d8a7a2d1b |
|
BLAKE2b-256 | 5b9cf08718cc3bfb22d66125eb57dc77b1a7e3e97d41a4bdc0cacba00e068fb6 |
Hashes for bgen-1.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3277b4dfdcaf42dde17b38c98e8f91ded10f020802c3773a9aa9ca0f484415b2 |
|
MD5 | f9ffbc3cd289c9e65428fd3a7926c821 |
|
BLAKE2b-256 | 2ff55c8e4a00493ce7888490be91cd13ca8e59d56979ade9f2e0101f264b4276 |
Hashes for bgen-1.4.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 846527874080190e5d3aac3d3367e54d7aebc49de733e3a4f232990c038a4583 |
|
MD5 | 7dc035bcce80694a9f596239cacbdc8c |
|
BLAKE2b-256 | c4dca395aae1873f2beb8b49d00abb0d661fc97dce2a06dcf20ca62f18b0f3cf |
Hashes for bgen-1.4.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8797570f0f918e4032484d80658ff0e35341f8ed9dd89800f844784d76044d4c |
|
MD5 | a86a3d2b6662976f17d7cd9e9aef1491 |
|
BLAKE2b-256 | ab0294b28abaf79467b49f06549ce894752c7571bd3e745467d35c313314c23f |