Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU).
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(pos): returns BgenVar with given position
at_position(rsid): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bgen-1.2.7.tar.gz
(663.4 kB
view hashes)
Built Distributions
Close
Hashes for bgen-1.2.7-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f1c4b8e4bec196fe3cdc3ba23d39ff9b8849ecdd067131a517e6dedae307fa8 |
|
MD5 | 0edc6f218afdcd86112d7d3488f97f10 |
|
BLAKE2b-256 | 52c8016aea9a88d6894c6a2b9ae3482e79147ec98b25bf2faa202b88b84da54b |
Close
Hashes for bgen-1.2.7-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf213dccdd5403f1881957502dc2c97f3929aa687fdb2970505d5cd7f19253ba |
|
MD5 | 370dbeb11558a4bcf37c24a7da3eaec3 |
|
BLAKE2b-256 | 955949da614ed6a49d0c204603daa97309d42540b182b80dd48ad81fddbc581b |
Close
Hashes for bgen-1.2.7-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0342ae26f49093eadc95abf79fc3df4997e5227866007365f54927167da1a884 |
|
MD5 | 6dc2e87fb710c9944b0deb40e0099da6 |
|
BLAKE2b-256 | 159aadc2d6be7fc380e0cc6ed05fae7980f9af9e0e2023fa910ef1f780e0e4d8 |
Close
Hashes for bgen-1.2.7-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53967ad0049f4ca224debb5cc9bf219616d0ee91b80577d057854e624b814782 |
|
MD5 | cb06b9ed9f4f48391ca0da999f57b6bb |
|
BLAKE2b-256 | 39b419007b4ed8eb8702a7d7d126ddad12829e4d765813daf0ac8fbcf582c9dd |
Close
Hashes for bgen-1.2.7-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3ce7da6740584a0d482f158c81a26114465742ab355cf4f60f8c6541630576b |
|
MD5 | 1fb043e87b306923b13ed52214e497f2 |
|
BLAKE2b-256 | b34eb830d411c6b64dfd9a9293002677154ae9c764cae773223cb7ea276e791b |
Close
Hashes for bgen-1.2.7-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b24ae10f8c7b243e091631b03402d2e97b9caed4d2d30e9f56d3214049e87e37 |
|
MD5 | 8b738f4f0dbbb732af1dee85b6d9d722 |
|
BLAKE2b-256 | 8ae0491ba6bed05448a33c244c4901f0a640900dccb173b810d31b600e4e541b |
Close
Hashes for bgen-1.2.7-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 924bca29d52089e5274d03589c75b9a3e7710fa3f3ae097542a0b61e79f88238 |
|
MD5 | 63fa8805a0a8cc7ca1714fed3a32f78a |
|
BLAKE2b-256 | 88980892d5978d5965e8ecb0580724ba92e7bf3261a8fc87010ee2ab1068c42a |
Close
Hashes for bgen-1.2.7-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e79e0599633fc2baec673b40ee71bb9a1c636f1b36929be95a7a840c2c3a497f |
|
MD5 | 4f21c7477077e9d02fa4887e43ea8144 |
|
BLAKE2b-256 | 32c38f90145b801dabb061b8207c67865f49c8017d35cac4b8cc58016ed645bb |
Close
Hashes for bgen-1.2.7-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c2fcc4ef0d298292e0becd576c63675605911a37cc593d431c33bc1379c8411 |
|
MD5 | 7200a0ecf948efb37396d4f1ab819e6c |
|
BLAKE2b-256 | 0f52de794c19b552516e8106a7774e98de6debb2bb47be5d69f59197e5bccbf5 |