Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenReader
bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
# or for writing bgen files
import numpy as np
from bgen import BgenWriter
geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
alleles=['A', 'G'], genotypes=geno)
API documentation
class BgenReader(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.5.5-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e05c7d925d0b57d36b780dda88d58af1ddb3cffc1a0dfccc6010338026accc9 |
|
MD5 | bf3f8536bb7fdfd9924068e5282359e6 |
|
BLAKE2b-256 | dbffe94428095a69f71a17363cdf684f911066e9326eb5deedc1803d10e960d2 |
Hashes for bgen-1.5.5-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 027705438e8eb17db8a2df5631b65239a7bce78340d9a2fd72a209f2aa0253f8 |
|
MD5 | 4e6394247d7a656d36c351f37e5caa8c |
|
BLAKE2b-256 | 1002f31fedebafcd1ac885bc8695c7aa97b694b907b206b6814ef33e59ae85d5 |
Hashes for bgen-1.5.5-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 494af4c1ab2e49d46f685b5bd2b75eecb2dc9dd4fa5fdf42c26e1525c15ce653 |
|
MD5 | b8706a17938e020bfb790098e99c987d |
|
BLAKE2b-256 | ef0224f1b40663fc9bcb2a3df54352619be7b84e504c94f0ad1e6b8f54cb5589 |
Hashes for bgen-1.5.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cc155bc890bd1ddc7088c7801646d59ca02139f7cdfab5b98fb48cbb7e1c2d1 |
|
MD5 | 9a71a13c980479fce34f2d49ae91fa90 |
|
BLAKE2b-256 | a080152127c7c4de427e8593e4cc8571abe2f7dd8d62be2de86f9e51e156dda2 |
Hashes for bgen-1.5.5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 774e035f0c2e2171c927326b9b8f5a32b8357a94d3ff98561590b29c11395a9e |
|
MD5 | 6122b85ff9397033cb5d6bd3bc4dd4d0 |
|
BLAKE2b-256 | 497b81b4e9c4c6b8d437bedd9654589f8a9b209a50ec63a151eadcd92c4b092d |
Hashes for bgen-1.5.5-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfb7bc962883ee4b3b9201cff3022c5f0db1d53b96a49a70f979b2e88de56374 |
|
MD5 | 2b9330f1d29f61079ce10d2769bedd0f |
|
BLAKE2b-256 | ab526ba620728392675e597559ba2471f723538efd1904a3e090175009348a0e |
Hashes for bgen-1.5.5-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ee8b9097619dd4a2e21cf9ca06dfafc5bff8846bde9fb462966e666fd3a771b |
|
MD5 | 61109ad2a7563a3477d937579060b8f7 |
|
BLAKE2b-256 | fca6f15c53df54527bdb3c6a30da5e9d1d103143928bc14cff8d2a0360ab8741 |
Hashes for bgen-1.5.5-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efb4a809eac124c3e37c1bd9f2e932051352a5c4206260b73e020cb5f5b5654d |
|
MD5 | d4c1420ae085b94882f9b7739d851668 |
|
BLAKE2b-256 | b1fbaa861a3f67149f46e749cae13cc514eab5b760fb41184052b6cbb08b1a04 |
Hashes for bgen-1.5.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0129860a6f6207abe5f6e75a4b165b9268c908ff04c4320c4f49e3565e1cd541 |
|
MD5 | 81cfd6038c4fdfbde53e224e67614754 |
|
BLAKE2b-256 | d3db6197e1265e094fe0a329d0582aa0ae1855d483a806d2a5169729f71d2e9e |
Hashes for bgen-1.5.5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ca1dde5ba33590829b847c1268400c7956f65fe962a142f1d91cbd0b64c6491 |
|
MD5 | 332673003d1dcd249dccee1cbf111886 |
|
BLAKE2b-256 | 9e314ec6a15994b5bbdd2ba8c448cd615b2b1ef4362d3b2d63beea310e05d432 |
Hashes for bgen-1.5.5-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9110d2e0638640c8ffc527f3b6a0a874784632308a6f4f62f58097f6398b3e28 |
|
MD5 | a0f4059c322770d74787fd27d8eb2828 |
|
BLAKE2b-256 | f1582d21d4bfa7f652bde0096b88e9517862f2ae1c89566f3da32967d90ad1e8 |
Hashes for bgen-1.5.5-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 617e1ddf59d83dc1b361ef755b64d8e172cf073d7de0112073ef6d5a0214c06f |
|
MD5 | 7aa2f70e2c565c57d046044385b12ef3 |
|
BLAKE2b-256 | 0dbfb8759d9e7989f15152104d13952da8c406738db8b29c4e69afd088190f6b |
Hashes for bgen-1.5.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 684d90b83d44d5a6681abcb3e75afedee2d5684bf96fdb7b849e7bdb2eaf7b07 |
|
MD5 | c1fb1a37a3f572e0da59f039cd09cba8 |
|
BLAKE2b-256 | cffb16894314048b77722925a8b55e52561eef13ca644601625b315aaec28bed |
Hashes for bgen-1.5.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb212bcf07529c8f470aba74c2f487db2b581a830872bae8184265ac5cfa353f |
|
MD5 | 98a7345da0525587e1239f2f33dd0f7a |
|
BLAKE2b-256 | 6c228e41cacd610cbd33e37dc30851750c7191b7bc716a16c27976af51be8b7e |
Hashes for bgen-1.5.5-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab2333895b1bff81f4295d8650586ab5a99a5c1457259fcde997477499ac6111 |
|
MD5 | faf7524ca155aed7e9fcaab35364934e |
|
BLAKE2b-256 | bdd3427a525010d49432ef4ac5296ba2bc5793db1906a2b75264f586b5b2fc70 |
Hashes for bgen-1.5.5-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 607eb85df38c99eca96b2a1e28eebfd1c8260cd925d7aa54b8159cfcb2f3b777 |
|
MD5 | 077a4c3edc01080d556f7c940f34aa7a |
|
BLAKE2b-256 | a4ba9ac049d17459c10f46256f7c53831ac00b05d5ca62ae95a8c56f5720107c |
Hashes for bgen-1.5.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94975007686ab587e12b9726327bf2092f6e15cd76ec5e77f83d91a6df2d7f85 |
|
MD5 | 7e2eb44ddcfc7b33d5a03a24751161ce |
|
BLAKE2b-256 | d67fdc013edddcee18effdc200d3690ad728b69516a9455a3661b5b9914efe49 |
Hashes for bgen-1.5.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49d546971717f78594c704cac4d680c314be742c8ccb4845c9ed47638c533472 |
|
MD5 | 8eab49866a3c7169c091d71904715f07 |
|
BLAKE2b-256 | b272111817901a86c815c54ba25e16ea608ec81dd70edc9b7ed9bde9b8dbb6e7 |
Hashes for bgen-1.5.5-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34acd5b6f73689f6adc519a9f74450d3439d4fe0c55d587da64bea21e775f816 |
|
MD5 | 59845212e563a84c91685817ca94708f |
|
BLAKE2b-256 | 97fca6ff2f1fe8b6ac2bd95cd64060a019b1b16ae3fad31da857391cdd1d8379 |
Hashes for bgen-1.5.5-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 769734537ced565781cd2cd9e21e038e68fa2fe2e38f0623880eee39fccecc3d |
|
MD5 | 7c1b1f6f37226c3e3b78da37283166c9 |
|
BLAKE2b-256 | abf97bc3f9852c998dc1c5718be779bf5d710fbf9a76511076496f4b7be9496b |
Hashes for bgen-1.5.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95b28ad1d50a5b81f2a04d98c7d464e42bfe1a2702cd9d8d8363949f8cb7fa2f |
|
MD5 | 5252b506cadd91f5cfa040c14ae19865 |
|
BLAKE2b-256 | 323a96eaf60da84e87ad45a624f030ce35f7db102879e99b0c813952b39ecda9 |
Hashes for bgen-1.5.5-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eb3383cc40af0724dc3c23efdb7493e76d3114916b1718bcf6ad0319856779d |
|
MD5 | 2891c6319d28e4d4ed108bca46128ccf |
|
BLAKE2b-256 | ed552710a61456ef6da9c891124fb1c6d8178018c950cebaa5221a5ecdb4e5ea |