Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenReader
bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
# or for writing bgen files
import numpy as np
from bgen import BgenWriter
geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
alleles=['A', 'G'], genotypes=geno)
API documentation
class BgenReader(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.5.4-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5db995b9cb91a3943afac7dd5a5a0e797e6b47d1be1835270fc9cfd5017e0546 |
|
MD5 | b7959f67080ffa75f39024ec8457652e |
|
BLAKE2b-256 | d7237f1bc940c197b847f652aef1d54c792d1ba5316c642214caeb8aa4ec8759 |
Hashes for bgen-1.5.4-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bce9f7185475aeac3bba0e465f6740fbb6d65227f7a4f029968c691079006fac |
|
MD5 | 2ebeb92a458365ed24bd8f7ac8486f11 |
|
BLAKE2b-256 | d20ede4149e005b3a1a88c0e74f3a499b10940589237078641f784250f4a624f |
Hashes for bgen-1.5.4-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa2907bdd267908b57d5126efebe7fa6ca5d7c548e0e9478adc73b7ac4a36b40 |
|
MD5 | 31a876df558fc4d5ec6bc7ab5c7204e9 |
|
BLAKE2b-256 | 1edd61eabc9b43fb7cc1371f2ca99e55f4a5d3eb807d83f0271677725ad25e9b |
Hashes for bgen-1.5.4-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b05be8ee9e50ee6967bb050b13e4db1d10989fbd8a7a729a34464a72e1a3841 |
|
MD5 | 4218d43b8015ba588716756298652d43 |
|
BLAKE2b-256 | 2c2176a1b9db9b776f04ec6330703ae7375825bcbebb7f0e96c78ffa6c1e51a7 |
Hashes for bgen-1.5.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a14cc9f1d81c71e1d7b9cb8712f8443ce164cad4d210d129a334a1920dffb50 |
|
MD5 | af04648cb2b5216a407411bfd59bb367 |
|
BLAKE2b-256 | 228648f1c956bbe90be738ffb5cebadc2ca76f3be13da9228eb2ab854c8e791c |
Hashes for bgen-1.5.4-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c24844c49ff7d0401886123e39935f8dc9434001d2e18a0e4d134ed69722087e |
|
MD5 | 1c050a82bf76f6dc9b55c950ec375cac |
|
BLAKE2b-256 | d1bb93788e7ae431a2731a7fe37e6255932bcfb03a7d19804319d8285d33e4c9 |
Hashes for bgen-1.5.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4875454e2f1adc0183e616b1dc824ff75eec1fdaa8b04b3f0f15b5c0073d3c21 |
|
MD5 | e141360eacd5b75aad1f4be877c14d11 |
|
BLAKE2b-256 | bb12cbfbf35bbc4e6be99c7f3c1487b147a35840552d35fc7f3002494bc0e5e1 |
Hashes for bgen-1.5.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae6f61cb2cb689330295f284e56ff09b781a69270a9fb5f0479c526019f5274a |
|
MD5 | 2d10a943c6b9ac8efcb6268ee444be67 |
|
BLAKE2b-256 | 692c1c03faf8188b62fa04e59f86429901d3c96b806793ce3156684bd08ae112 |
Hashes for bgen-1.5.4-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e2943b61c5d0525e2a4047dc7920fe3943bd7b6fb0103dfea9f7f9ee3b55219 |
|
MD5 | d52ec4f334cd533b3b2d4767f9939c95 |
|
BLAKE2b-256 | d1233d9e509e7d9cf13e71c0a101ea8238566e3aac2a67f37e6cc1c5f0b58f11 |
Hashes for bgen-1.5.4-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eeeafd213c10fcff21e1e959509dc18aa728203ba50557328ce3df3c6a4e30e |
|
MD5 | 055f0578cece4dc2d2b4b4c050aca1d8 |
|
BLAKE2b-256 | 80f6663d8a92eee036da8ae7a699d8c7e26fc6f1fcfc061177b07e161579889c |
Hashes for bgen-1.5.4-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b1ef92d1e3e2df93990655843bacb404b7a8f6e23a75cf9a9d59842581d8887 |
|
MD5 | 2d4f330412a98f8675e06bab84fe27d8 |
|
BLAKE2b-256 | d25d862c25b9dbeedcffa3a0d67e3f857cff9d5d4850f3c747995474b3744440 |
Hashes for bgen-1.5.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a94205a2405622f373a914ebc9d65303124d190549f5d6e22258877550de6f2 |
|
MD5 | 3d857fd6457580c3e8fdc8b95819138c |
|
BLAKE2b-256 | 0cf68e9f01054e9432c388feb59a473dc5726ab80e917321d85180b3c710dc2a |
Hashes for bgen-1.5.4-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7204c8bbe09202a4a2324f8a5804faabcd6dfc402c0236df12da8d77dbcffeda |
|
MD5 | 885591f56c9967f39d443e6d15ef2001 |
|
BLAKE2b-256 | 1610277c7769c72d31ee3b43cee5de908fa4c97250f7234b2c3120ea47938c06 |
Hashes for bgen-1.5.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3f0f3665e2244420a38bcea70c9873db3794b0bea3eb26f0b13ef41bb4e47cb |
|
MD5 | 065be5ee5eaf7bd9cb293bc0d5ca0b5e |
|
BLAKE2b-256 | aaa8fb94f6798a2359257ee6d6e598bceaae3a2ff1e34537c13c6a8435412000 |
Hashes for bgen-1.5.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db31f7dc190d435818c3c4dda8760640820bf2cf5c15b0f265d32340b73f16f7 |
|
MD5 | eee389c98cb7164467637322a6bd05fe |
|
BLAKE2b-256 | 1c8f94d7604892265aada21b004242385975e9f3ba91f3bf87a2ecc83c4eb590 |
Hashes for bgen-1.5.4-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a937bf5b7fcc288ab53e6e393a22638733ceab4babaa03942bce8658b212265 |
|
MD5 | f608d1df4962176891ae291e3d4479bd |
|
BLAKE2b-256 | 5e6607e3bca3f35d7e8468ceff4297b4643bef7911444976d89dd71e9a28b2eb |
Hashes for bgen-1.5.4-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b787f9a2c6b6b4acca6f4d0e4c0e3ee3df3335314bfb26cafab44fa03d90822f |
|
MD5 | 9366456f90c8fbb719bf53cf1db48c5c |
|
BLAKE2b-256 | 03616ae9de3f5ae3ef48f0fc5123c3e3b93e6ad1a560a38819674a7c5a26b6a6 |
Hashes for bgen-1.5.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6086591af9098fb9e7c9cd3cf890076c1ddadef2e433efc2856ba4521712aa1 |
|
MD5 | c41cac5d222f84adaba72daf549495ad |
|
BLAKE2b-256 | f5c37d7e4128f5832b1f5fb26e3e39c73e6c0ef59dcbc977fa81b35cc78c18d0 |
Hashes for bgen-1.5.4-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 342f059a6586c11584508e33c966aaecd9143c9f3df09d10ab57b515946b7613 |
|
MD5 | 6c49cc0c8b7829f4d3d277ea412d1f01 |
|
BLAKE2b-256 | 3d19abba844fbd345b51be282bc8478e0783b04a87a674ce8ef9ce942e4daf2e |
Hashes for bgen-1.5.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | becd8af15037e0d7ab8e828f25e92f6ae6b2eb20c7d1f5d044c3ea361e69ea2d |
|
MD5 | 6bfabc1ea3304c12687a36581651b59f |
|
BLAKE2b-256 | f53ef2eb4ca65e55654cb256a3b6fcc7eddd09e600cc180a671721932df19b44 |
Hashes for bgen-1.5.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82751d38b2be3d0abb2530aa4e6f2bfffdc91ab4463ed176127abf8d25c7e9eb |
|
MD5 | 0eb90089cb99e0f817450439b8690230 |
|
BLAKE2b-256 | 385a5f08175479dcb585d7e3048458a530223a266d2ad12b5b9755babce7eb3f |
Hashes for bgen-1.5.4-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01fe8e4a80a7aa62a5a56c82d1a5f1f8ad4ff750b3472ba866ffcb92ea61b178 |
|
MD5 | ebfacc44287ac7ad8cb61a5ba0799683 |
|
BLAKE2b-256 | 0466a86f95b53f9940c2ca4659c79666ae5c2c6ba4337cb848beb9d3e128ea69 |
Hashes for bgen-1.5.4-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5614bfb3050341404d2e94219b99efb1318bad20171c6ea49f993685f0ea945f |
|
MD5 | a0bd3943051e83b7a3e39f927fe0b4e0 |
|
BLAKE2b-256 | e4c836ca16d07ef59de2504ff9fea67c3c625a8b084fc4a79250fbd0d226d3ff |
Hashes for bgen-1.5.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 658b956f404a03ace7d1d6ebf8f42188bc645b759f6c35c0ae43c2aba14e0719 |
|
MD5 | 726078efbbdd5520d60a92946f00b762 |
|
BLAKE2b-256 | cc9b6ca75341f0701d67004a35026ade73336602b19f5628c87ca4bcdd71f9ac |
Hashes for bgen-1.5.4-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58a569baa17c53652f1abb4ee62c30e1269fd67223e92da35221b54f0d19b8f0 |
|
MD5 | a904d5fc1729fa5bbdb3f8c952083537 |
|
BLAKE2b-256 | 825fd459960219e1cfa4e07920b0c719536573f1b1501d24157d2618b5a1f80b |
Hashes for bgen-1.5.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9aed27288bbecd868ca1cde83a14aeff7113de9fe78f46689423321a844fdb4e |
|
MD5 | c6853e0e4a18aab9c9fef0c057553b44 |
|
BLAKE2b-256 | 8f49d1aec55b30356004f6dbeb9392421c48f58f922bff9138425ccf8eb8ace5 |
Hashes for bgen-1.5.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54574fd78df56e063302e70058a7b4f7ce3c99b8304e90e1fd9a5115c338863e |
|
MD5 | e76c7ba26ad1993127952b05c45593d9 |
|
BLAKE2b-256 | 2a0739515a2719f614fccbba406231c65552e4c2be6405966241be68e970276f |
Hashes for bgen-1.5.4-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b39c5947e7861bd50031dbe32e2bf8d7b43634f23797df54123923deec13a1ef |
|
MD5 | 392a67e07486fb6d85783cbb884c9dab |
|
BLAKE2b-256 | c2151f48d7a4681b245e559aff256abc55ee7eb5c05f849fc6acfbe9cc83d118 |
Hashes for bgen-1.5.4-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cad4292c32736c9503c68ce9db5f4accc441d536a8507f5b329d91c61d7a92 |
|
MD5 | b4531ae918036ed75dc2e236ac354ae5 |
|
BLAKE2b-256 | 962e5376824daac17928c456f4f59eb08fdcc878fc2ede54302b30af54fbd7c4 |
Hashes for bgen-1.5.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c898353573ecb419e55068d58a773557f966854367e26d97b7c5e89982c11a77 |
|
MD5 | 485d8ba544c66bfca335fdf63f797cd2 |
|
BLAKE2b-256 | ee07309b62cc9b8109f43a8d7078786479894c078d4f0c580cd9c24c59e2dcae |
Hashes for bgen-1.5.4-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c48855a988be279619c91fa5bb1793eb84dd36f5384c6e58fe6f38b55f9d55a |
|
MD5 | 64107721f46fa8d234166caf1c06eaba |
|
BLAKE2b-256 | ff98d51fe95f74f2e5f6811aea84bcd308800b8339e50314a5c082cbfa82f64f |
Hashes for bgen-1.5.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce40f9158488a04e6df2056bda6f442eb8529f0abeb46b71fc4f8c5ced28196b |
|
MD5 | 47425249ab556a0f85922bfaca248da7 |
|
BLAKE2b-256 | 6c62fed3408a8632d95df5314cf4c6379ec8bce0788df0e53b2fb812fa42f096 |