Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU).
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(pos): returns BgenVar with given position
at_position(rsid): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bgen-1.2.10.tar.gz
(665.0 kB
view hashes)
Built Distributions
Close
Hashes for bgen-1.2.10-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9128128d47ba9d425f60c0318f87a88512ca3541de0923ddbc702f7ba88c7c61 |
|
MD5 | adf5bf5466bc7e66ba0a0bc503303834 |
|
BLAKE2b-256 | 645483a9a2b6c618780bec8dabd11713a9fad50efab3d0c6453ddea15e80bea3 |
Close
Hashes for bgen-1.2.10-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 896ec5df18c497a7883f72abee82a957fa4a0e9f5955223e3ebe7b34a75d7f1e |
|
MD5 | 1fcce3e99fcdd29952d7d7eb3ef86981 |
|
BLAKE2b-256 | 3d83d325704e91f02bbdc390158a383893dd67eee0b996d50e8029a88633df6b |
Close
Hashes for bgen-1.2.10-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b37f905e53c1d6b3baa8181792c755bb9b73a7654ef3e91ab1dc87f29699e48 |
|
MD5 | ae9acf5283461400619692e75c31bbb5 |
|
BLAKE2b-256 | 9fcf99cfcae95d559e5e93d4dce1159ba28e252a18eb520fc3c8e2ff6f87dbf6 |
Close
Hashes for bgen-1.2.10-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb21c4fdf26bb9a1b2467a043dbf0608041ff255c752fcb7f277967feda8d6e7 |
|
MD5 | e9d0e85be1365b09dbe434a15f817c35 |
|
BLAKE2b-256 | ddf714e4f0a5c420ed6eeda58148ab1c4e76d9f9263e6c79233255da471ed4c6 |
Close
Hashes for bgen-1.2.10-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48ba7acf1875a95a1417806dd71ee59efcfaf9ccfc377d6c3d6c7d32cb8e8910 |
|
MD5 | da4033960d3c0fd4f2c122cd70897fe4 |
|
BLAKE2b-256 | d8451b00c28a2d1562b492e3d9f1e78889a7d8fb0e4bb52da4d508a8fe29d194 |
Close
Hashes for bgen-1.2.10-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eba140773f490c867b66e701c1ddccb641cb9a902121c2abeda4f7a56cb7cdb |
|
MD5 | 138d98f349f64500bfba722120b2f638 |
|
BLAKE2b-256 | c16246471f88e2a55b714b3f671e8dc6045e117cbae7c88f764e16cdd3e71dce |
Close
Hashes for bgen-1.2.10-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 971c3517b12d1a074cfd395995690e269fe246d23e4ee54786413d2f667da14f |
|
MD5 | e935e0611e282fe41cc52935398dae24 |
|
BLAKE2b-256 | 8595debdb746e95222139825c7259ce883aff676f473af9b0907c0545f30538c |
Close
Hashes for bgen-1.2.10-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efebc6dfe82daac9640029b51d10785600034ba695ee8a5728da3e0db9abe2f2 |
|
MD5 | 7b9eba24f14601377dbec50a04e71d00 |
|
BLAKE2b-256 | 6c289945334cb01cc72285b36e4bb336fe01a84e718d654990764584cc034035 |
Close
Hashes for bgen-1.2.10-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e270ae04f9e543f5b15a6c681569519ad8601b3f4883926f1ccdc9b1dad1b63 |
|
MD5 | e5fb5761c9c885b4c882d263dcf166ce |
|
BLAKE2b-256 | aab7a467c0a4d35f289d7e3fa3ed6c0e5240cc6238643691197063f2a70459eb |