Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(pos): returns BgenVar with given position
at_position(rsid): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.2.16-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ab1f1dc4439afd66fb60b43462ba97f9c7349254bf961b38dae9be02c904f86 |
|
MD5 | b08dcff9f2855cd2aeacffa5206e2ba2 |
|
BLAKE2b-256 | c0171cc1c1c58ed0f5ffe34e2e783fd62a26ec060632bfe84989072f7825cbd7 |
Hashes for bgen-1.2.16-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d66fbcec5db8f362f67d04787cc8fe6fbe4c142bb6529fb987893397c091a0ac |
|
MD5 | 2cfe1d6285edec1d8040d3d69ab6c925 |
|
BLAKE2b-256 | a83d33ca76bc195c4a75debe4f2e7279e056a7ba0eceab253bf815c00b71d231 |
Hashes for bgen-1.2.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77b816fccb656b8848dd5d2faba3431a9787a852cad19ff1a1cdb2fdc14a002c |
|
MD5 | de01f2241d2f88fbbc3ba940a53531d3 |
|
BLAKE2b-256 | d06732d0005b1d8c138c71e6881e5ab18d2894de05606a9f0227373132f9a620 |
Hashes for bgen-1.2.16-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d416c93f11f86a45617ed9b99515fdcecd152438d4d258c8a56690a9781bdf9 |
|
MD5 | dbf34dd26bc6ad9a5b775cd6096b0e6b |
|
BLAKE2b-256 | 0d05bccc83a6d065e84f2086dbdf1909636f46ab00e9b6f3c73bbe130beddcf7 |
Hashes for bgen-1.2.16-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6340f0ce53178a0d4a74c4818d2cb7e9ace6583487634aabb514c0d1faa8533 |
|
MD5 | 6d6cb1e02b822da52d3a81d3acbcb92b |
|
BLAKE2b-256 | d8ecf1343e683cf123b1d0bf9bd3afc38f0bbe3f80a0c42d2361cef638504e78 |
Hashes for bgen-1.2.16-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c524ca3aca42af8546e852a59317efb72d8eda69b6ed085f2f98ca5cb1d28e8 |
|
MD5 | 75bc71a5cb24657f5328e31d8feb63e3 |
|
BLAKE2b-256 | 1d78276ee35346759e83b1e68951b38499bb1316673b0778cfd4f46a69928a99 |
Hashes for bgen-1.2.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93d3b2f92778b4ea706511573d6584a9fd13a7176c8e83073edafd36cc48597a |
|
MD5 | d2fcc4e11c1c2573fc0b1c4e940cbcb4 |
|
BLAKE2b-256 | 0191b1bc26f55f53214eefbfa571bda75f21de826eb3a15ec94a185b84caf34d |
Hashes for bgen-1.2.16-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fef7caa2f09cf17557d8d2cf7fc24939826fbdb9c987528030ba94dfe5a474b6 |
|
MD5 | f0e0c0dcdeecee29d136e60a35cb0e39 |
|
BLAKE2b-256 | 9010484fd3c13ea5a93df7a1545ff73dd367a5d687615fabbd4345f8918e8d39 |
Hashes for bgen-1.2.16-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0ccd05d99ac7f4c4a7a8b444e86d0a61e8d6e7155cd809faf7675831e0905e2 |
|
MD5 | d87acf5e4ae9e96c3489d63d1d711219 |
|
BLAKE2b-256 | 25b34b03767b995af58f1ae43dfa50c87c6ad45901e3dc93fb433c7b66a4737d |
Hashes for bgen-1.2.16-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2527c4a8ebce728c0ecfb734fd16c8a4391cec290f0f33bb3024dfdfdc01bf69 |
|
MD5 | 6b552a2cbb247a11745475a0c697b621 |
|
BLAKE2b-256 | 434c1e70759d0e5f42aec1772c4873e3299ad8f1692cf5e13aed35cfa206b16f |
Hashes for bgen-1.2.16-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc26d91aa0dc776c0f0a56b1e93cb2c02519a40e7a90f3458f80a2d8f0a4db7f |
|
MD5 | 3876a3450f03bcaa872ceff0d957310e |
|
BLAKE2b-256 | 1da060d4b697d3b9642221a2ff78f16cc7b4fce190b6e72178bd7a7d7453f796 |
Hashes for bgen-1.2.16-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6270ddd047d60524620506c423507f83ee965ec5223f20871a387b01e284122b |
|
MD5 | 4a2081b473ad5cbcfd8d6daa6c886f96 |
|
BLAKE2b-256 | d0503a42e2145f91a0eb9c50fb9b0f2bd8c0e76800c45a6e28bd3175fd67bcf5 |
Hashes for bgen-1.2.16-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80e8f7c0a587924b5aa6aa0d17413a29ed93b9c13bce4c720483110b7463084d |
|
MD5 | 9a25833dcf0dffc3503594936c4cfa47 |
|
BLAKE2b-256 | 326182b8e813eed3b1e8aa198324cd95aaa974c978807a85ddb3308f0cd3dee9 |
Hashes for bgen-1.2.16-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e021e646f85396f4daed1f79fe20c9675ee893ade6c095bb8228abd4a1ddec1e |
|
MD5 | 029fe7fb2924c0600372ead35edaace0 |
|
BLAKE2b-256 | 206c2499b4e9c3829b6265c0a83011a76584983214748724a07bf648817b1272 |
Hashes for bgen-1.2.16-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b0ace45a27a46e412a583e52bfba1dddb54b440a108fb6272141408c801c8b4 |
|
MD5 | 63f1f43150c949eb1cae5d1fca3ce741 |
|
BLAKE2b-256 | 286cb97a010fa1e5931c00627dfaf752382890d9a0eae2866b4bf3a431df408e |
Hashes for bgen-1.2.16-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e18b39d8808a04b327b713c24d040e53e5693bdcd156fed452c028cbc5f1b7ca |
|
MD5 | a922e2e5e55a2570b65226d510cfc952 |
|
BLAKE2b-256 | d4bdca2387c4046e779ae00ed87d6e68996ea1f2e76de3809ce801f63f4ee42c |
Hashes for bgen-1.2.16-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c3008237726c262e2687b83c7174275968f2fe30557256687629fae6dac2364 |
|
MD5 | 8025fdc7ce8534f1856d1f8338d00788 |
|
BLAKE2b-256 | cdf26c956f5e43bc133b12a79762bd86c767356b9367695e179f2fa92d9eb350 |
Hashes for bgen-1.2.16-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d690b4b5054e1baaeec6e591b95293b300ba88d101ae715bad516a2d9839ef12 |
|
MD5 | 7695cb71c49475ac8fbb55c10b7497b0 |
|
BLAKE2b-256 | 068aac6becd5d6804b21a7b4b9de22373f467eb5555e062b0f65acfdd0c3f038 |
Hashes for bgen-1.2.16-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45f352dbf16864e35d117e9820d350f4705b31ee2e1c7220aeb594d63654c6a4 |
|
MD5 | af6ffc2976812d56480f6558b3b1b934 |
|
BLAKE2b-256 | a1218c9059f3b249fa05ed65c4ee1e54f3e147380bf2fc097f24726786d13ec5 |
Hashes for bgen-1.2.16-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c493e130b875db00b3c1b05c5debdaa2b508e5156528ab1b96190f26529ee3c8 |
|
MD5 | 570682b899a1050603a4f424e80a8a50 |
|
BLAKE2b-256 | 50e80862a620b0378a7a9f909b8afcc22d8d329428c859b32aca9a63118074ab |