Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenReader
bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
# or for writing bgen files
import numpy as np
from bgen import BgenWriter
geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
alleles=['A', 'G'], genotypes=geno)
API documentation
class BgenReader(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.5.6-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fd5eae55086aeffc43abb4faec355e426eafe04deb98d61fb3e2421028d7352 |
|
MD5 | 77170564cfef58e60c509ac813976cda |
|
BLAKE2b-256 | ec566455dac1ee415907d954e20b72a2b891b480b7a78263dc08b9ea8a57ae69 |
Hashes for bgen-1.5.6-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 724f483894535a8cfef4e17f38b55a559c48ba0e776472856ea738251f3e9ca2 |
|
MD5 | 5ffffa27dd8a49e96063cecef2759f48 |
|
BLAKE2b-256 | 03f033fdd6f489b0c4d59bc9ed2f1f934db7a3165f5fed75311fe5155377b041 |
Hashes for bgen-1.5.6-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28e11fbed164d032d1ca1b8aca792534677e0ee1c6d15e7f049ba4f9fe062951 |
|
MD5 | 9623aa2f14ddc2ec73d9fe8f080568fb |
|
BLAKE2b-256 | 5fa7022cab68dc8d8f368cce8b205c6ad679cd74067232d0d00d93d0ff528e88 |
Hashes for bgen-1.5.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f80be3e209659a81d4824938342925ab88daa0fcd88efa77892f7bad7324278 |
|
MD5 | 3d7ee431b4fbb573218e5ed0ddf8a5c0 |
|
BLAKE2b-256 | 262b5d7e381ceb3954c2cec0154f6cf246fbd6761f408f1e4a4250bf57903e3c |
Hashes for bgen-1.5.6-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c79b4a778c1367fb1662ac82bdfd02cd2c5904dfaf6b0b665b59e71efe83986 |
|
MD5 | ba5cd6db0bfb6285c9a52907c9a3e887 |
|
BLAKE2b-256 | a9f964573de5a957fd09fe7ebc8555d1d46f545d1c150c6f4fa5612d30c2e1d1 |
Hashes for bgen-1.5.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5227bdec94c18ed05d8215adc7ec1a8461258af56bdd2fced5266dc886949527 |
|
MD5 | 3e3d9c86729b8f74de4969b75b05fa28 |
|
BLAKE2b-256 | 085f85ff0ff7b50bdfa878acdcf08a96827b7433d7b853d2230632b8b2552fa7 |
Hashes for bgen-1.5.6-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5433d17e13ba435b954ad4e9454fd611fa5b23272f6a716c06610a775d31e96 |
|
MD5 | 68fad148e37f4d70fec456ab6a33966b |
|
BLAKE2b-256 | b67d4d3164e7e504b4588530eac92803acb1d053e2b8d18f21d32f3aa5917f5f |
Hashes for bgen-1.5.6-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17d79ee1928049bcd8e392c90cfe4eb3db6fa56b5c64259b99ea52af89ae6bd3 |
|
MD5 | 46139e4f5472832237ce449b4c6bb9ca |
|
BLAKE2b-256 | a878ead7bd1321f7c76434df4344e02746f39a3a071a601e85406ba07440ef45 |
Hashes for bgen-1.5.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 362c2b3344611ee5d80de146cd416deb6ed2e446cd6c38695b289740f5a842e5 |
|
MD5 | c38b8972fd5f9bb116f64b00d453aa17 |
|
BLAKE2b-256 | 5daad7f8c64e9632b88109aadd48a73f05e4082ad6f39b9de3c52ba5f6730659 |
Hashes for bgen-1.5.6-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84286398d94edc4cd5caa3e944580615b32ae545711be0c1f7b12de871a12008 |
|
MD5 | 07c4d90f8f9c4064796293acb5ce5173 |
|
BLAKE2b-256 | f843a51b4c2d6ef20868484285a8e67728c06ea4627f0e71ce7cbc9725606c46 |
Hashes for bgen-1.5.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34349d2eeb29b027cf107db96c5f756dcef2821ac5bb44429e215989cfb27e5c |
|
MD5 | 152d455ae05ff8c1ee6084456c5cf206 |
|
BLAKE2b-256 | 020eb65842aaf69db3d6ffe112329d8f039f1a3c0853f2e90d69cc7bd0052db9 |
Hashes for bgen-1.5.6-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54771e5a85bb26765fe967231172831022e5df0e7b1c4ae0f702803b577f1fe2 |
|
MD5 | c8443eda16f4190b41f65aef1b386cfc |
|
BLAKE2b-256 | b9ebd2bf9224e65259f391af2afb2081ddf2813d9b76670289d3382c52254dfa |
Hashes for bgen-1.5.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ce81e35ca2de8d74aa3afb0257f9645704830297193e0bb644ba1a71812ae45 |
|
MD5 | f4137016ebd5e3b6871c6450c0048cf0 |
|
BLAKE2b-256 | f9860673132605b82cd71d35651ad82ef8f987b599071c2e1299caead886566b |
Hashes for bgen-1.5.6-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6636851413ab076edac941e0556cd5f6f5de4a62738801c11f2b02bcfe4c9870 |
|
MD5 | e122cdf8efa63901038cd134a3330f6f |
|
BLAKE2b-256 | 7c2e6ef4cff529498cc54f0be308b5117b779017bf345bfedce03f8291b93fef |
Hashes for bgen-1.5.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c5b1b137ec765d8cb5bf0c894a253eafa69cb24db0625024d9973bd94d979e1 |
|
MD5 | 9e225186baf737f23dfdfdb3ae7ac811 |
|
BLAKE2b-256 | c6ddcb1124149d1cfe6a62dfaa511c7cba690b5d00c4efe29fed08a1ceac4da4 |
Hashes for bgen-1.5.6-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fca956b04f42c1ad262a98631755333da1ad033ed7164e58db33d306cdbdde50 |
|
MD5 | a217a9d17516c2c60ada0a8b2cbb8e48 |
|
BLAKE2b-256 | 2ea10a4c1735673072f2e4c5132e9cd5c493f019a89cb565300e6dc81583980c |
Hashes for bgen-1.5.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5245594311bab5512f2c56593dd8b7215ace8c1e9baa5f715d6ae4b366f8a405 |
|
MD5 | eea0688786dc49f9df82bec815cfba66 |
|
BLAKE2b-256 | c6707ab57caf05629ac2452505e00fed6fad3f9d0bcd190d5d4a171213998a41 |
Hashes for bgen-1.5.6-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44f093cd4cf2ffb093914a10688acf2ed36284ef73d6b7be55402bb00bb607e6 |
|
MD5 | cf034ec7d343059741a1094e43d6993f |
|
BLAKE2b-256 | 99134082796617dc569b72f24a0d6c7c6abeb4d4936d7649a21957c768ca0c3c |
Hashes for bgen-1.5.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bc98ace425350be8e86a63fc0ad92079c6162716b9c3bced338c1e99a8b03a7 |
|
MD5 | d5d185ed97621e0a5d51e0fc7ff90535 |
|
BLAKE2b-256 | 463f256d9271ea091c631a8bfcb6f4f1970cfe2ca27e24104870614f22ec36c1 |
Hashes for bgen-1.5.6-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 458eb35837163a8d8ba3d98e34b1bea3890a4d53398762364438a63fc85a3e46 |
|
MD5 | 1b356afac01428cf8ffe19cab2e43100 |
|
BLAKE2b-256 | 2c8fadf73ee4106a1ee0f503317287cde9cac7e4abbacd635818eda19de100c7 |
Hashes for bgen-1.5.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6722ea79fa12d5838d206738a00fc1e7412c1f05e21aa999d21e24a7725e8aa2 |
|
MD5 | cc917cea2d92748b2815a6f18e89c08c |
|
BLAKE2b-256 | a0592610f27f5b83610e476f0f057c24c7d86d3208f031afae8a8fccc13b31e0 |
Hashes for bgen-1.5.6-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbc1bb0950a5c437a841b316ef2e81a45c3a5959c8d60a9eed2869cb5bd3478a |
|
MD5 | 2984ec8a33043f1d87492d250ece4157 |
|
BLAKE2b-256 | c6769e710687153af3761461186d840db0a0320f86a0e7921be6370b289ac2ca |