Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(pos): returns BgenVar with given position
at_position(rsid): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.2.17-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99a5f5a3523c1b466431f4bf8209fc9a77035b08fdcd882d978104fa457f0bdb |
|
MD5 | 8dd4dd7123d1d8707cdd0ab8465a2f0a |
|
BLAKE2b-256 | adcf5c19e40d9ed626b17dd44b881bbcc9da2976268afd812e44352a2dfe53d6 |
Hashes for bgen-1.2.17-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aeedc7a8995ce332b37e54404288fe366ee54102f667684bc96c92319403ab1 |
|
MD5 | c0423a08695c65d3ad9beee2e9e85e1b |
|
BLAKE2b-256 | f297e4864f6425ed90b6eb099eccbc68a63db2aa5c31105126b891bd2eb2f4db |
Hashes for bgen-1.2.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 235066f27f6a9d0e0e882a554f7ad797c42a06669953dca2dd3cdf77fffcf250 |
|
MD5 | 1047f5a2599d77b5a84d09ae6a63d5ae |
|
BLAKE2b-256 | 7f0c87c5886e79e641fef3363af5bd0376637017ac8333eecb378e9ae2d1a59f |
Hashes for bgen-1.2.17-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33db928cc3314a41948d6d970806e0c5c2457ce7ae611d284a728a62ad71de38 |
|
MD5 | bc83ce90fb685ba94f4a81a57de8c21d |
|
BLAKE2b-256 | 66c0cf1c29f02ebfba502b61a0d61f71e7cd2aed3f567d9dfd8a14a6d8c8ecd4 |
Hashes for bgen-1.2.17-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 169754a2df6bad2fe1e39bad8f7138ab877bc74bda0e2fca575a2e96252a0b1d |
|
MD5 | bad1ec3fd2dee26df0a2099e903f8709 |
|
BLAKE2b-256 | 01a8ae3f5e8cc0a01e8476f1066abd3b0dce3026227ec27a54578fe6425f88ee |
Hashes for bgen-1.2.17-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b91210325360989c0b4efd556926460193c18147f775aa56cad479ccaa47516 |
|
MD5 | 086b2ce647c052cbd6753cb36692fdf3 |
|
BLAKE2b-256 | 9867e70b81ce3d0d7d5b631d16a5db606dadd3caa5293ea36d838be5541fae8b |
Hashes for bgen-1.2.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20094a25f19e4f72cf587b2fe23de34ab88eed1f3880638bfcca98c0dc24ec2c |
|
MD5 | 0d1123abbc8500260867647637a296e8 |
|
BLAKE2b-256 | 898e5f7819283679aae7d0b8672a52ec4d06187a6ea90e76543f0113dd7ea5da |
Hashes for bgen-1.2.17-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c532c505e7f90fb4e21514fc029a45b9b400f26fecd160d8bb251f4e24b280b1 |
|
MD5 | 22409c45f374d1af5066ee46314816e8 |
|
BLAKE2b-256 | 03e174796159bf8c9dd70693639a3f4771bec96bfc5c57bcbf006bf39be1095d |
Hashes for bgen-1.2.17-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d04109251bdf5104edfaa854d668103421d1f49e9f21afdc5b018aeac07c35af |
|
MD5 | d33f2e3dc467c9ea4102080a992f31c7 |
|
BLAKE2b-256 | 53d65bfb0cfcf27567b11f9b7e49b49deae374ad99770853e35a42febd713ebd |
Hashes for bgen-1.2.17-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1122c8140a9d2860c0c93d568b6a44cc159cd4b932fe07fedf35f610e544ca3f |
|
MD5 | cd59ea9d6ac4da88247e5b4a64b68664 |
|
BLAKE2b-256 | 36b0633d75e9f3c5c1ba9330bf8e027e028a08fb90938b56072dd9d466f2753c |
Hashes for bgen-1.2.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83ee3e021b1f57f29b68b37e8284f2c0751111d0ad1a5830a12f058ab227ad2d |
|
MD5 | c7b51d9d77d034f49912f57047416c79 |
|
BLAKE2b-256 | 01e84c345363057b654f2c79381842e6f31d4fa0367ac358214c5c202101476a |
Hashes for bgen-1.2.17-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7edbf90fc0758b7d2d7b8c8e037d6db6f57c726b8fe0fc470cb5db9f5791672 |
|
MD5 | 8b0e2f0e2e3894efc7129bf5976c5bdd |
|
BLAKE2b-256 | e36755a8645f67d119a00597407e8ef7859fc4624a0966c138c3565547dcc30f |
Hashes for bgen-1.2.17-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75c8c8c70fc92d6a1a5ab0aff945e59134040d437d1694b78e11958b5f076738 |
|
MD5 | fe2d3d5829a7b608efdf647a80a81597 |
|
BLAKE2b-256 | bb22a56dc54de5b15142b952edf70ff95650cc5ffdf629b3db2035de1c33d3cc |
Hashes for bgen-1.2.17-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eed5548809fa772ce6307e4435f257d46a66b551d7ab1cb1aed61dbbdb3dbce4 |
|
MD5 | 96e36d115448e8075f30654e9f119e55 |
|
BLAKE2b-256 | 38b4fc7dfc6697f8337dba41005c1a8c6f328936506e20e97be22bd1a177c274 |
Hashes for bgen-1.2.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd70f3b6d256beb26c0c151c85d9562bc3a2e3b7f67839cab0fce732d2bd4057 |
|
MD5 | a74f8553e5f988282219c5c5294b83ae |
|
BLAKE2b-256 | c558dfa1fab96b2f23976419d887996ed293342596401dfe1c8ebfe788bc8050 |
Hashes for bgen-1.2.17-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4f15b6acd75285513bcb4eb818348e77f4fc2341b77881099b57b15a1d3919f |
|
MD5 | b3f7bc9fba406853a0cfa661bf36bb6e |
|
BLAKE2b-256 | 79663968bb85a89b416dc93a86e0e61d25d95d59635565e22e835678370993a8 |
Hashes for bgen-1.2.17-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b5b3774e0b0360de8200574bf1adabdf098c3e377bc992454db04d774f2b42d |
|
MD5 | 357b3613432575e9fe2a9779ab04b55e |
|
BLAKE2b-256 | 391994faa26671a116653eceb60bd88c43e2f3cc257db22dfbbb8b9c1d2fb17c |
Hashes for bgen-1.2.17-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b73a6c40e09ff278bde4098727d8e2d0ba234408249d2cd6ac2ce7638de2ac1 |
|
MD5 | dd41158ead7e64a9bd8e2513ccbdf42f |
|
BLAKE2b-256 | 1a9b14cfbe141fb5566f5c60dc7018710af6b2b3601ce5271359d88b3125e0e0 |
Hashes for bgen-1.2.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb57431cf82ce5d18398b7f5d08f071ab40de9818b2af5717544fab5d0ad5fff |
|
MD5 | 376bd12cb7cffc24a836cd6ccea1b35c |
|
BLAKE2b-256 | e02cd879e7046e57fbb7c647deaf70b7631fa3b3d5d6852d07e0efa59f2f5da6 |
Hashes for bgen-1.2.17-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de314e8b2a82655bf27ee5967eb15a772d4564695f403682808e019ba9271268 |
|
MD5 | ce2bbef42835545f01b7615691c6c8b6 |
|
BLAKE2b-256 | e2c9fd7896edf65b9b0b306316bfd0f45ef7e91072a87fac831615967cce6adc |