Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.4.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81ee9bdbc925f2e817c15238dcb132dedd3a9b8d4a703e15707d5848f39e2d00 |
|
MD5 | 171ad2ef03ccd6b616ea54b40958b5ce |
|
BLAKE2b-256 | f465968e5f31b850487e60bedcbb91748ac6e17258537138ecda2dbba28ac0c6 |
Hashes for bgen-1.4.0-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 607c8388070f9b8aa55a8a9f1ddb1a53cc73f9cbeb502c622141acc219352035 |
|
MD5 | 54b682ccdda01a7bb0e01bb545f8c8cb |
|
BLAKE2b-256 | 920a4f8eb18991ae0eb67b883e421ff948ef6cbbbbbb8f6d21da4553f97d702b |
Hashes for bgen-1.4.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be74464bd395c52ce041b5710cc20a3447d0ddf4c84b8461501334255c26aa47 |
|
MD5 | 08d844a83d0e10ec19a05f0224eacb26 |
|
BLAKE2b-256 | d3f161fc4d7b97a700641e978739f8003f99ee1422b4f273b23745f8a69cd78e |
Hashes for bgen-1.4.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04aa3a6245f9dd95643943433d7839fbc39d316b5e21a19b049192d94a52a7ba |
|
MD5 | d6d665f8aad5f1f706119dd0af2484c6 |
|
BLAKE2b-256 | 0081dd94b936565738e0cecffc7b6a88d380fe6ef08ac4e04c574e779e0aad28 |
Hashes for bgen-1.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c5a67e6d10d5f58c0ab3460eb665843ad90bdc88342e8d204469478e739c49b |
|
MD5 | 89046370e5c508e971190f0fd1cc9785 |
|
BLAKE2b-256 | 0b5b3b31714ccf1d2cd526a3d24111c48f2ad8f9119ba08f40fca414f4cfc283 |
Hashes for bgen-1.4.0-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19789024dfbc650db3e8653ed56194f9cdea9ee1dd604609deee4c26de9c6600 |
|
MD5 | 1cbe5b432c98e3db8a0b6af6343384ef |
|
BLAKE2b-256 | 65ab796ddf0239df8a878719b245b2cab8945e854f64ada8e82d32e69dd60828 |
Hashes for bgen-1.4.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ea01e3ac1b3da8e93cf2c468150b48f75535187f23616336c1497856e4af760 |
|
MD5 | 3b4bd48c6d807b40c2430a0aa7275b0a |
|
BLAKE2b-256 | 1c5dc08c3cbca41cbc4cb3b2eaaca3d52f2c01cc96e44320fa96c379c7241771 |
Hashes for bgen-1.4.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 572c2ac23e99416edc90af691f3f98d16662bccf4adf9a46cfa099a764e9369d |
|
MD5 | ab7b1c57107f59f9fada2f83eed5952e |
|
BLAKE2b-256 | ebc642858d976f86bfdcf3003329d56910988ddf898cfe76d1bb943110314967 |
Hashes for bgen-1.4.0-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2f20afb0cb95f96b583f58108847ff7379741d4f339f713a237a46c3a2a177c |
|
MD5 | 4df4f55efecf768a384489b26f9a536b |
|
BLAKE2b-256 | 8fde2ff2614469f3440f5034bb32bfc7a0246100e1f112b9c249c3bb3e414ebf |
Hashes for bgen-1.4.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5c0de2739fb4701845e8b658efab94548b469dee24898e99323ae88e0d65758 |
|
MD5 | e1190ad70a0d87337340194f0b2e18d8 |
|
BLAKE2b-256 | 23b5628ffc58602e91ba4f16ecf32d79072cb9febc03cfcb822e4f5eeaea81e4 |
Hashes for bgen-1.4.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5507cf73a78f2e1489999f0a25d68cb390773a2a9af4fa3d771ff030a8f52d4e |
|
MD5 | d581afe942b1b7041f2fb7ec22fc94cf |
|
BLAKE2b-256 | d3a9879148457f764215651f25dbeaf7241ca6654e0ffc599005efe4626f4dd7 |
Hashes for bgen-1.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa6ae8dbd6f77a587c284c2d470823723aca88b50a1ea916c108b22fd4be883d |
|
MD5 | f8ca3a077264159b5c0290260c3f3e03 |
|
BLAKE2b-256 | 1a0bcd6a095ddabe4943bd86f0c847e618fa013f899cf7c718777ba4e3d775fe |
Hashes for bgen-1.4.0-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e16e4d8acb9520d6c3eeda3c788ab663b3a51bd31b6a811328cf4ca5d80dfebf |
|
MD5 | 80973f8ffcd4684b8ed96cff6f33f8d8 |
|
BLAKE2b-256 | 156534997f9562cdd59be563668689acbc92f3bfaa352c63506d27308725133a |
Hashes for bgen-1.4.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7a4771056900b299f6f89fc22c043636e85f38164f7aba542b1ba910f02505e |
|
MD5 | b7f94d9d8e7269b198c051d9db77635e |
|
BLAKE2b-256 | 8e3d55786e0e45d68261606706d9bf64bba42acb0936696975a7bed74ae55496 |
Hashes for bgen-1.4.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d71ee1708cbf8792b81de882766de2c6a3a63830fbb00b0ec041b3ac451622b4 |
|
MD5 | 0640ad05ccd80be11903ec66f8d7ae82 |
|
BLAKE2b-256 | a9e273f808b0523bbdd0135ee60d6062486f6f56d876d58459b60017ab81288c |
Hashes for bgen-1.4.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4da1e24c26d633a20ad0966bdc63e75a36865f2624f6cd1c6c5d08ed8999f2b6 |
|
MD5 | fc3611e250772a3e2ac66470dcfa35ab |
|
BLAKE2b-256 | 3b995316007ccb557886c9b796fe871b7b05d1204a651ccdead65b4c7314c8b2 |
Hashes for bgen-1.4.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f44075a3b3404eb49e371d9c257c10d58c751e11b1940a61e216c812fc65ecb4 |
|
MD5 | aa775c05768f0e18e8c85ad87a7dbfd0 |
|
BLAKE2b-256 | e0e4e606e626003a018ae019b45d1ec403cc48ce7af1750adc473bba006f292b |
Hashes for bgen-1.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcacd3edc1b2cf366f55b890d40fa06a751d927e6c1644b158465588f484987d |
|
MD5 | 879376bf186e493dfa90fea63ce80310 |
|
BLAKE2b-256 | db19853a6049a497c8599afea340ef2fdcf46ddd186dbe547a8150936d337583 |
Hashes for bgen-1.4.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac5d4b79219f7295263f6707c29f5b56209579599e97ec2471dedea926393e22 |
|
MD5 | 9c355152d234bf1f218933f1dc5e76c7 |
|
BLAKE2b-256 | dc7659557fb4c8338e57d9f1fc84fd80eced414da11640c59a6e3b63ca7dacc2 |
Hashes for bgen-1.4.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17af3575cbb31661cdc6257620a64e3d6b3bb3640211a9009be2f5d1e8c1723f |
|
MD5 | 64f0ac6dc53da6a688bbf1e16c681478 |
|
BLAKE2b-256 | e6d1eac63ff2af9fe80cd4e0b4411dbe5a09755c4e7f41ca615aee30c4ec954a |
Hashes for bgen-1.4.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24f9675eee36fb12c8e6d7750d0fdf04c5fbfda08f60689fabea3f2167067002 |
|
MD5 | d604067ad2253902024afd54645efef2 |
|
BLAKE2b-256 | 27553486911e71be1b3d192b22c9e61ef16dc2e33c404121292500169bd1a92e |
Hashes for bgen-1.4.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92604df8a2921629295c62e1681fa3be6fca69de5d8c4e1a75db08e7d6788cc0 |
|
MD5 | 8667af72e1a7aad3ee7b045ed65e9fee |
|
BLAKE2b-256 | d8e2945a87a09d83a97a0da1787a215ef9f716617473d28d24c994517f5e072c |
Hashes for bgen-1.4.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | addff5838504c3be1fe424a812b0044834df24d056c2afcb432c4e233de72a90 |
|
MD5 | 8fce424ce19c2c1eec3bf54bcabddc11 |
|
BLAKE2b-256 | 45e7ef364b376349fe719bc1bf007141009bb100f83a5a6358807ec9bcac00e2 |
Hashes for bgen-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 286577450bd0adfa8b7b46d951b31e5785c9164632073563f9a5dd4265e9fa19 |
|
MD5 | 30ba9bbf447dc2ca91502e929c4112a9 |
|
BLAKE2b-256 | ee36e68f10c4f4c59199543c4e22266ee66873fc97aa0e893dbfd8deec582f2d |
Hashes for bgen-1.4.0-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b531e11d308ebf71162addb1192f24b692aba1bb06df6efe86aa998a72443c5e |
|
MD5 | 7187d8947cfba1fc9202c406f7b38f32 |
|
BLAKE2b-256 | 8e98325c463a232c790480a30b039ba49d4456b82995dc66557e8a87041669e9 |
Hashes for bgen-1.4.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59f845a1e045ec46bb31dac43c8c60b8b40f504fc56740de53827034d4c4e936 |
|
MD5 | 068077b249015ec0dec74011b2bceb24 |
|
BLAKE2b-256 | 19359fc208e943753e90db7e5b85afe9d7050ada96503d31b6a1dd747ccf15c8 |
Hashes for bgen-1.4.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6e114d2cc78d85068e154f43cb7ff6f720ccc93129a722dfe77257497caf5b0 |
|
MD5 | 5192767e8b0ea2d7f3669e9d45aebab7 |
|
BLAKE2b-256 | c4ee4b6499741833f07a66ee8e1e6c520ce23b66648fcf4c6d53e70b4d92b918 |
Hashes for bgen-1.4.0-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 934c285ddda4acb396806a17b3ad256accc8c79c3daa9d00ad512cc1c578039f |
|
MD5 | 762c235dfad5978550a3f0aa40d40f5c |
|
BLAKE2b-256 | 5d865a0b76d04e774d77bb6242efcef8d4e7a1f966aedb360dc76c08bc6d472f |
Hashes for bgen-1.4.0-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 017f1d2c61fe1a6c36e8fb3e9671cecf32936c32ee51e0a7439f36f2840fc489 |
|
MD5 | 9a5d582b8de0b7648cbf9aeb9ac15061 |
|
BLAKE2b-256 | 70bb73987240273f3d1c674811a05e4f615760babadd87dcb8b8534ebd71aec1 |
Hashes for bgen-1.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8d041c76f60aa7328c8061bd1313022ca058546cde0454713864e8c43e51782 |
|
MD5 | cefde697db12ca7c6f3172f3f3963d22 |
|
BLAKE2b-256 | f32cb1b96ea95459c1ae02857f10a52383e4ed261c3424ca3da81f9db526b4de |
Hashes for bgen-1.4.0-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2c56e203a219c005ff01f707c4caf41c7a11a56e9908466450683ccad3ae635 |
|
MD5 | afd74d79f33feea714ce7186e93c526d |
|
BLAKE2b-256 | fa7047a86030c28d2ea92ba71f080bfd6411328c14e021c884b73476ae54cfd4 |
Hashes for bgen-1.4.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c906b584e43af5bda6d7da8ebc4541ca5f970b850914a86efd1f0f98ea4b146 |
|
MD5 | 888a1223c5b37eee913539790d064eea |
|
BLAKE2b-256 | e57ddb13dd1558a0d7bd1b30f160d30fe447392aed42be2cd60a71928929c0cd |