Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenReader
bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
# or for writing bgen files
import numpy as np
from bgen import BgenWriter
geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
alleles=['A', 'G'], genotypes=geno)
API documentation
class BgenReader(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.5.7-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7b7095709462132248b0d906bb4050f34871318bfec2324ad78dab3b294af84 |
|
MD5 | cd29b6b8e1f147e083dd90c0788d9ebe |
|
BLAKE2b-256 | 01570d9abc0fa065d227611e84e114c6f2e8ca465053b0c53ecf083c11e21c93 |
Hashes for bgen-1.5.7-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8223ac35bb8fca6645d4c3c3a2f14b0561cf25882e62ab8af50ee93c2ad11fd5 |
|
MD5 | d0703fc3ed922eabf569301f458e518b |
|
BLAKE2b-256 | 03c492a2a23baf48fee1f2d3eeea0406d88e434fef7374d5b8223b0177e8c8e5 |
Hashes for bgen-1.5.7-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a4dfc210ec3db93064b8302237095ac3285cf5e63390b30196a0f4a389762a7 |
|
MD5 | ab167db18d03b1bc33e55197826e4daf |
|
BLAKE2b-256 | 304475259d06c5832166f622bc7b5a89bfa06336a9ac27294c75e9a3e46f1839 |
Hashes for bgen-1.5.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bb578ae174e33736669f7d40f775d5951808376d4a9361998163af376c4d2da |
|
MD5 | efa36de53169282de354933d8d2858fc |
|
BLAKE2b-256 | 7004ebc880ac8b7907465a7714697763a6aefa6881752a3e5c36a585f84b7fd3 |
Hashes for bgen-1.5.7-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f990c0d7c157d8d51082a2aefeab79fbf44c2e1a424b9d35fb363884d2fdd33 |
|
MD5 | 9bdb9fe4c245c3c73261217ea5ecd8a9 |
|
BLAKE2b-256 | c94dc154df973b309d60443533d7268055a1dda69117bcbe05425a09714d5e66 |
Hashes for bgen-1.5.7-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a3886724b4a26b3f55c6d9e8090f7b892b1e484a3f89a5dc610da88f4621aab |
|
MD5 | 526e32bad763a5cb251615d75317eca1 |
|
BLAKE2b-256 | 2e4020bd47c8a29fb4f671e4087e57b8fab710f09b0fd8b9a8495354e826f380 |
Hashes for bgen-1.5.7-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0b01ea08b0d9874031f5163818b65b3f26beab660e335748f2e0a98aed28dd4 |
|
MD5 | 3c2fe5b3ac44971df9ff1ee56236bf70 |
|
BLAKE2b-256 | c6b010ec509fa5fdd973f0ee6f31f7c7ccb11031198b6129869b1274a009a38f |
Hashes for bgen-1.5.7-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73d96899dec0091c8178267aec0de5571d45e7282d867e50ff07ab6ec254d1f3 |
|
MD5 | 0b3d06e27d5999701d39e72d1c6bbcac |
|
BLAKE2b-256 | 06df2533feb0c966a8473e3294bc02cfa124d18caffd2d091c29ef98a7aa2593 |
Hashes for bgen-1.5.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dee64fab8f84770d33339db9f103e6abc4ec5e94e19907650205dbb9c3ab3b99 |
|
MD5 | 303cb516c9d2d76434e4d69191c59b4d |
|
BLAKE2b-256 | a5667510263c35e61138acbfebcc4882f05845bb6e8f32fd4cbbb28a221ad467 |
Hashes for bgen-1.5.7-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3695b13def3f0ffe82d7bc40d6200e67a4e723028b061d1dfb2ffaa6c648517 |
|
MD5 | ff12b7b5607998c98831573badbe251d |
|
BLAKE2b-256 | c4e0f57e320f4a41826b3cf26cd78703f42758c5b379fa8eec0648f879286997 |
Hashes for bgen-1.5.7-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99a4e94fec6162af2a19d8ec9d7a864a65e517b7e2860c38872c19c53d4187e7 |
|
MD5 | 1b6fb0d8a2320f8ab718e767ddfb4b32 |
|
BLAKE2b-256 | a52033c3b5ef8342343509fc5aead48bf47159acb0f777ea3dfca77981751b19 |
Hashes for bgen-1.5.7-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e2fea1ab69d1a7c7636350d182143ba1b7fa07a72329b5540f6d259da82bed6 |
|
MD5 | 1104cb46542e43d33a244b3166141348 |
|
BLAKE2b-256 | e967ae86f15344d2d2dd319218a746f898f20df03897a3e604b83946feab547d |
Hashes for bgen-1.5.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3264adf1e808fa4328bf833557ee8e912746879f401de992ae83adadf0d025c7 |
|
MD5 | b7d0a1096941d4045be35adc797c8578 |
|
BLAKE2b-256 | 47aa1ac41f1752a5dcf1389f38f25a1ac812834d40f5849da839684ff7f4f7bd |
Hashes for bgen-1.5.7-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06379140238d048ec7a433123b04cce54d43100fcb92a071cf466ca143a72a72 |
|
MD5 | 1e9c1256c979469551ee02d57bea3bf6 |
|
BLAKE2b-256 | 5fd45e543a57d70d7238c6434deb919a6726a384bf5b99633708b240c0523115 |
Hashes for bgen-1.5.7-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee1b34791c2fb7c4a4571251893979ae83245eb85e211f38e58b63f6619f1fcf |
|
MD5 | 1fa31197403c374c9ba639dfc40d4692 |
|
BLAKE2b-256 | bb32ebfecc02e1138bdebcb17a2c4bf9212089103ce1f6dff6b6f82242344357 |
Hashes for bgen-1.5.7-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c4baea2ca9fbdff094bfa9040ee73c401ac0e90c640cd48aed52b71f304892a |
|
MD5 | 2ef500b010bc37ebadc826015d1798d4 |
|
BLAKE2b-256 | 06333823f4e1e62ea9f4a8ac258b4370c570c1291f897a33b881338190ee1345 |
Hashes for bgen-1.5.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af62510914199104ef932fbcf3e37e3d1e5291be6bff8ee7718aa95d643c0992 |
|
MD5 | d1771d60feaa97b07bc4da10d56e6dff |
|
BLAKE2b-256 | bedad00d4a2f30033d5ff10cfb7271dbb5cba9f183d6da2c7b7b8ae4136e1450 |
Hashes for bgen-1.5.7-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33e715ccd9ad0ba73f6f091b7243ddeb934fdeda2164d782a4c123bf7c03e68e |
|
MD5 | cc0eedbd3afed9e1ade25fd1d2e6e731 |
|
BLAKE2b-256 | 97ca03bdd86288ff9460dea99039e2b0d6d02763b497635383642c0e4e969da9 |
Hashes for bgen-1.5.7-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4fa9bec359615ef8368075043eeecc602d65f3321a75e9d8e349dc8012e61d3 |
|
MD5 | 15babe44788f24757f2b679157a033e7 |
|
BLAKE2b-256 | 3ee1dfc53ebd0c46d6b7567fccc782c0ef3a111d7f5a3e212d81258a95dbf290 |
Hashes for bgen-1.5.7-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f195b8c7791c84644a95f95dfef6769a42e07d9ff0500e3ae0194d3f24156ab |
|
MD5 | 79baf0797eda9236a650acd6a1e5b2c3 |
|
BLAKE2b-256 | a2635f8a5b2555affc6a61d05d6121b1862fa3f2f8c5af28af31ac45fe8c3b78 |
Hashes for bgen-1.5.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0d13d07a08fbde47f08a4d08c2b03a131f6ead4dd761a8dbc93f479d48db20a |
|
MD5 | 0eaab58735889a8ed1a098c418f8167d |
|
BLAKE2b-256 | 37c7aacbe9d01c001108cf69e490b8d1dffb4c050e16ce176084b7567084f25b |
Hashes for bgen-1.5.7-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 999aefb2c35efa5bb865eb3a657cb225d4441ed51b049c0942fe71662044332c |
|
MD5 | af4a02c1dc5c616ea2b17f23e1ecaa04 |
|
BLAKE2b-256 | b65fae67c90991b25f73e84aadb3185ec08ad735e4a73b58a60b2c3600fe9940 |