Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenReader
bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
# or for writing bgen files
import numpy as np
from bgen import BgenWriter
geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
alleles=['A', 'G'], genotypes=geno)
API documentation
class BgenReader(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.5.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f84df7a9c82b87cc53315429cb8460c62ee102a2b15654788fd27d0b1b66d776 |
|
MD5 | 4ee744f349efdddf85f60885c9e9f10c |
|
BLAKE2b-256 | 8fc57d8bb18d804dc79daa7952681af95d4bfeebe0807f308bdcf88b58f6c40a |
Hashes for bgen-1.5.3-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7504995df6f9e54b6254be6ff2e7bcd7388c63d8f59242d10833b40b7c7bd9c7 |
|
MD5 | 78af0727a8ed9f51a5f1ee3d44f74e4d |
|
BLAKE2b-256 | 8147a253f145f30b1837a11e441e04877c51f8abdfb701a490a0e9a8c9895a87 |
Hashes for bgen-1.5.3-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93aee6ae0eaa3e8d1f089e552b995419008a2e40efbb8e5fa87d45d1914391cb |
|
MD5 | b5dd4d4cf584eb1aa47d4ae8d894e542 |
|
BLAKE2b-256 | c18b3489e760664f9822399b1ca20b1a8a7c65a08c758591fd6504742577969c |
Hashes for bgen-1.5.3-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a03485a27e90fcc35844445aad3ef573cea60ace5aadd7b3d34fdb50f0baeb5a |
|
MD5 | 05095ffd8472ae3ab07f847e78b05c67 |
|
BLAKE2b-256 | b13f07e6049600cd45f7a5433dc9a79de21ee5ec0dabf119602a7f67ef16e28a |
Hashes for bgen-1.5.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33b7f3314a2564b4bca5fbddd2c37679e4adb7deecb9354db1e8d8b67ed8c7e7 |
|
MD5 | 37f6af5ddacf1d0e64c47908ce3125b5 |
|
BLAKE2b-256 | 0952c1bf1fb7e6cbee0f8ccbe77908f1911ad2dff769466325ff38c0b6d9b827 |
Hashes for bgen-1.5.3-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1514963bb487f1691a04683f51d68c168f4484d775d93c5bdb76e719581c4bf5 |
|
MD5 | 9656922727c543eab7ec7fc4882835ff |
|
BLAKE2b-256 | 3ac2a3201152528039562f55c1c2e869c33e5faac548bbe953d3bb26283529bc |
Hashes for bgen-1.5.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d24bbd01844104d1f915d4d803a9f570dfa502655acf02a1f27395b3c728af18 |
|
MD5 | f379c93e1f5fe90d87117348bab61398 |
|
BLAKE2b-256 | e31486f70294b4c884f6a02930c1b1bb47b740fda7d881b99ec2ba5eca49c569 |
Hashes for bgen-1.5.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f2d0ba4033624053bcacf4341a5daa9cf1f7aae29da63a4e62d5c663abd197d |
|
MD5 | 96cb911dc4ed1569de750e2237afa781 |
|
BLAKE2b-256 | a6e5d1d4fe7d5e0dc4da261cd89e8c4acfceabdd20272e787519440798f6fbf8 |
Hashes for bgen-1.5.3-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91a9a48f567c83d5d26b9200491e59f895e17bb83ca3799974b79d2f64fddb3b |
|
MD5 | 1a1738dc9cb308bdb34e3b485da1013b |
|
BLAKE2b-256 | bc41683f9c94e3a1ad6f7a0f1c15d8862b10804a031bc024e95da7aa082130e1 |
Hashes for bgen-1.5.3-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e245416c56795ed030eed02ffa073acd7fa082d014a0634ce93ade2c1f851757 |
|
MD5 | 5453aa4f8f713a7cba6b53066cf99671 |
|
BLAKE2b-256 | e2fb318c711584e04b205877dd007795b8ce9b6df5725482a32207b6bede8672 |
Hashes for bgen-1.5.3-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 451d87c7654e5737cce74de0ed4e9325dc8eb1dfcf5357ea2258eecf51a878c3 |
|
MD5 | 15c33d1acdebe9dc4dd0fe5fd224f8b4 |
|
BLAKE2b-256 | c7c0a707af98fda15a46ea7ac8f4d6eea012757006b7e0db5260ae61cd5405ba |
Hashes for bgen-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8573088414df78cc9bcc90b97713be8bf7ed544790b2dad4229f16d365b30ef9 |
|
MD5 | fc55219058c3c9c4d2c14def59748ece |
|
BLAKE2b-256 | 4c1185d5a531687d409048f4ebdf504ee576b9948d7329f497245dcd3c8c5baf |
Hashes for bgen-1.5.3-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dc6d271c1f894f3416689ec11dc8ee50bb2e3c0e31ec148779fe8cf93135b99 |
|
MD5 | eebcf22c36a5908aeed99214f3538cee |
|
BLAKE2b-256 | a372daa3350a211ef7e3d4fe97e3e51cc3e7dda2196a379eb9bdf179eafe907f |
Hashes for bgen-1.5.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 436b462fe9c9365d3e76160c836e74f4af7880d60a3433fb1a0687123232545d |
|
MD5 | 2080a3761bdb3f4e77357e21756d1205 |
|
BLAKE2b-256 | 4d82e75df3f7b2383ebd1da0f3e74b208e5144d1b75928f42015393a8c35c81f |
Hashes for bgen-1.5.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f760ba2a6b205b0dbce4608db24b4b09ea4eae4862086ef2ff2191641607a4d |
|
MD5 | 4222cd460c575774f5db1dd8495049e2 |
|
BLAKE2b-256 | e7f595e0f587c849d398dc7f2794a5a29c285d7b170b850bf7637b8112414478 |
Hashes for bgen-1.5.3-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40dd2ba0a78ca568f225a198fa4fc917d417f6659c83b0d8201c4de84282947c |
|
MD5 | fb8e2c502b66ff401ef2ff18b7b5dd73 |
|
BLAKE2b-256 | 226e44a7f93d1b3616a63561d8bf6ad7c69186bc86b3f70d902750f7b2abd9b7 |
Hashes for bgen-1.5.3-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 359de3600ab3433ba6f06bf71bb0415f63b34584a574bddf09e516a9155fea73 |
|
MD5 | bad297d830907ff62750a774c4744f9c |
|
BLAKE2b-256 | 2c0d268766208fb9a5dd3d17aa5ba31fa6d5b686006523c4f154ac618d632005 |
Hashes for bgen-1.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f665bb8a0a3a2ec5fa004a5232aa2c09e5a2a9cf4f458cb94365ab95682cc275 |
|
MD5 | 423206c7aff140884859ffb4035b04f1 |
|
BLAKE2b-256 | 5422fb6cefbd9774aa20af0bde829a99b215d59825fcec37527194ae908ffe3e |
Hashes for bgen-1.5.3-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1a45a9ac40552608a28c292b84034f36f92964a87fc9bfd17a45f51c2911f15 |
|
MD5 | df0be0822efe89d60f0d24fb66953748 |
|
BLAKE2b-256 | 50ede28f50266ee7636b2e308b0adc42532accb22f5f5914e58b6d9e7c254d2b |
Hashes for bgen-1.5.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14132acbb00ed74a2c7c6cbb0bfc909c239eab7ac06ea78f033579ffbb2e3cd1 |
|
MD5 | 3ad15ed39fcc8366458da3fe3263f338 |
|
BLAKE2b-256 | cf9570cd807a038ec46db998314691d21979cd14a068c0aa13e9e89eda8d811a |
Hashes for bgen-1.5.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff1f0a3de0f989ab5af833be5449964e57ff512c77bd3d6ff6caf77e0699cc6b |
|
MD5 | d1547daee7a79b90568428a7f6f3f9a2 |
|
BLAKE2b-256 | 2304d0d8704ff76b8716f7886f1f17e3b465382ce03b9fcf934e22b8c0a6af72 |
Hashes for bgen-1.5.3-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c0d3b383eefeb6d860140a8848cf135737b9f972a082d00c420e8e5dee007de |
|
MD5 | 2a2d0b39dad69c06513816b954db6120 |
|
BLAKE2b-256 | 89be96218d2be7eb02af8643df8b5c686c3ee38554328bac4e26c65497bbae29 |
Hashes for bgen-1.5.3-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19d539506529f607d456e4f9f34333848be8fc0286c82a7ee6e0635234cd1143 |
|
MD5 | 986437548f5f9bf5e9e9a50f8cb79684 |
|
BLAKE2b-256 | 13cc01e5438c4a6d7ba93fc5096737a18e179e079c1e0947c5494aed627708f6 |
Hashes for bgen-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9867e825f60f67a33ec481dce0ca0feabcdc1f59f964b2a721efc03c88ab7654 |
|
MD5 | bbf6cd4201c5e40818a21ae475052200 |
|
BLAKE2b-256 | e5f5e542d596179ddb619a59c056870d127166082b508799d44440fec304d0ae |
Hashes for bgen-1.5.3-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76302d3c07857e36dba9c8f72becdccc2fa7c21e93a80334c0be4111bc03f3cb |
|
MD5 | 58fdc95d95425858838c0298e25de677 |
|
BLAKE2b-256 | 1e765b27a675addc70c135a2a553bd7ffa0ef22316beba3d1356a9df016c5964 |
Hashes for bgen-1.5.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a11c2841dd68211b7935070f810cc5b27322c33395d04e4470ca5d0b81d1ddb3 |
|
MD5 | 01416872377bcc854c4f09f3e7ba5d19 |
|
BLAKE2b-256 | 43b5f062e1754d487c91ffd2f8e84813a71345e893a89e8f4fa4578adba2cce9 |
Hashes for bgen-1.5.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eacdb2686b1b8eea16f63f19dabf9482ff5c431148e301b1b349d4d1590a5f91 |
|
MD5 | 5c4259e274a0929fb8dfd1283cf1ed77 |
|
BLAKE2b-256 | 45d20e38bbff1f33e1edab20444edfb1d1dc3726c61307e0e7a513b6a3f189cd |
Hashes for bgen-1.5.3-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0948912747fafb2b71b863c4a230c82a24e8166b0d71193b4519f508548c2367 |
|
MD5 | 9a0381550604fc89d8364aedaaaec9c8 |
|
BLAKE2b-256 | 8090b7f31055bc18eda49716118da48977e33b649ffb496685f1961652e322d5 |
Hashes for bgen-1.5.3-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4f7a0800febf03b6fc57702d9c70814e52fea205fb70721f659416646da5e66 |
|
MD5 | f4f28f3d35e35165639d23a538108b89 |
|
BLAKE2b-256 | 90d15418584a9ccddd004f1bcd0e3fa58fbf725922d4a0bc3c5d41ad851fb925 |
Hashes for bgen-1.5.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf465c6c0c7adc20e0d07cca33c4443465eaf59076d14bf30bfecb1167ffd80a |
|
MD5 | 49f5a95f284a920756f393fb45110ce5 |
|
BLAKE2b-256 | ef80e76577f1b35cbd7d0fafe7519c9eb227a0a053e17d534532a6f6f0687d83 |
Hashes for bgen-1.5.3-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7005afa056534de2f0afe9da9fe141d732b77dc9812142bbb221acde66c34432 |
|
MD5 | 875524e22be7c058186bd9eb00ae628f |
|
BLAKE2b-256 | 4ba0c37357d3806e0afc1cf113c6e2d661e7b1657172f58342fc19e4e9bda757 |
Hashes for bgen-1.5.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ea701c70eceb5a6c72db89e5bc04b149381630dc55ab2ffca5d046ad45b8e1d |
|
MD5 | 9f07dd57d2662778a53bb68e277baaea |
|
BLAKE2b-256 | 296c116871f1460c20817a7917d282a2593de117827635fccc0425f70eaed382 |