Skip to main content

Package for loading data from bgen files

Project description

Another bgen reader

travis

This is a package for reading bgen files.

This package uses cython to wrap c++ code for parsing bgen files. It's not too slow, it can parse genotypes from 500,000 individuals at >100 variants per second within python.

This has been primarily been designed around UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed genotype probabilities, but the other versions and compression schemes have also been tested using example bgen files).

Install

pip install bgen

Usage

from bgen import BgenFile
bfile = BgenFile(BGEN_PATH, SAMPLE_PATH=None)
rsids = bfile.rsids()

# select a variant by indexing
var = bfile[1000]

# pull out genotype probabilities
probs = var.probabilities()  # returns 2D numpy array
dosage = var.alt_dosage()  # requires biallelic variant, returns numpy array

# exclude variants from analyses by passing in indices
to_drop = [1, 3, 500]
bfile.drop_variants(to_drop)

# pickle variants for easy message passing
import pickle
dumped = pickle.dumps(var)
var = pickle.loads(dumped)

# iterate through every variant in the file, without preloading every variant
with BgenFile(BGEN_PATH, SAMPLE_PATH=None, delay_parsing=True) as bfile:
  for var in bfile:
      probs = var.probabilities()
      dosage = var.alt_dosage()
      ploidy = var.ploidy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgen-1.1.3.tar.gz (646.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page