Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's not too slow, it can parse genotypes from 500,000 individuals at >100 variants per second within python.
This has been primarily been designed around UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed genotype probabilities, but the other versions and compression schemes have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenFile
bfile = BgenFile(BGEN_PATH, sample_path=None)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # requires biallelic variant, returns numpy array
# exclude variants from analyses by passing in indices
to_drop = [1, 3, 500]
bfile.drop_variants(to_drop)
# pickle variants for easy message passing
import pickle
dumped = pickle.dumps(var)
var = pickle.loads(dumped)
# iterate through every variant in the file, without preloading every variant
with BgenFile(BGEN_PATH, sample_path=None, delay_parsing=True) as bfile:
for var in bfile:
probs = var.probabilities
dosage = var.minor_allele_dosage
ploidy = var.ploidy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bgen-1.1.7.tar.gz
(647.6 kB
view hashes)