Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.3.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17dd3be671ffda200ff104d8eecc69de7958adf43a37d078cef9f230358001e4 |
|
MD5 | 2c5be7c6774bb3efd1e0e7895b67f405 |
|
BLAKE2b-256 | c7102fc8a857f821c3f245803a400b2e11961e744fc967142058e5c81c7faa49 |
Hashes for bgen-1.3.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1aecd1fdcf613d892d615508c622be9a32ddb3e4744853cde2bebab02c1d355f |
|
MD5 | 4fee92f7c78776b4e853788b3f6fc5d7 |
|
BLAKE2b-256 | 27bcbdb2f0ae9dea5cfca00d428d6de5cb24f55cd45d54fed2c58f5458043c5f |
Hashes for bgen-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20669b65c579a4192535f7ed25d4d959025e3d8eab62f776a22e09f677b3e974 |
|
MD5 | 96ede7b3dcbd3e15cb23353de7031044 |
|
BLAKE2b-256 | 28ad5996d84b9773d0336661afd397999e435f44a833bf123c65874b1ebeff0f |
Hashes for bgen-1.3.0-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 345e22bf40e8ad422a4e7fc259e2c7f7cdce0d04d467723772e78b741cebf198 |
|
MD5 | ac6cb75c70514210bb50c942e024d521 |
|
BLAKE2b-256 | 5933312a98cb0d65ee575b7c0117578fc81b9d29c2e0efda2b72d2e4da3002d6 |
Hashes for bgen-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abd3cef1705d51c1da26e6d18605c61a58464501c81f2bf89c62791c959d457d |
|
MD5 | 56d80fa1654facf0b6ba2b4e08c0e167 |
|
BLAKE2b-256 | 403275cdb32c07b9b24111cdbb0fec09e21e37e91aaafa649fffe8eab791a26e |
Hashes for bgen-1.3.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeb41d9b217e6741d769ab21b14de1c78fa1b5e6ab9e71e4dcb681ed699b8dbb |
|
MD5 | 3da54841a68e4cda3037690123caed34 |
|
BLAKE2b-256 | 79c4e12060a264c33121fbc51f9022a661775c7eff6d9af6e58f32fc5c65985f |
Hashes for bgen-1.3.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f28e109cbc23d7779eed37fcc533bef0d3831db2b5dd0676baecb50a8afa619 |
|
MD5 | f1d346ed4456db11b1510b2b9f09e2a7 |
|
BLAKE2b-256 | 8f35a61e79c12b1c04a76106c4c885e7a547c291e56a91d10f1954354d793009 |
Hashes for bgen-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d498d357073e7e46a351617f442215bde9c3ecc4ead891f0ff28641ddee24c27 |
|
MD5 | e35b3b196d42704a478841e82c247628 |
|
BLAKE2b-256 | 071b976c906c628ec868284099296c13d2c989ac984d371eec1b391d6cf61e91 |
Hashes for bgen-1.3.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd98d10f09f40749912ab702b4ae43753d40f69b13b507afebd1ebe80bc5e87b |
|
MD5 | 9415e2e53a1762cf195dd51640fa07f7 |
|
BLAKE2b-256 | fa6d4784c577a399c65d31ca8b2d464e8dc5ac4c2965248a9644610e183b3dfe |
Hashes for bgen-1.3.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e666b6b3d061beedb91f670574dc6183601663b6d1bfdc6415f3f41e71c3089 |
|
MD5 | 4e612678ab1764e6caaf03a27fdfa127 |
|
BLAKE2b-256 | e3f71b9a213bcf018a2a0c9bdd7ad69911e8697bf873690d3f38858dfabf9609 |
Hashes for bgen-1.3.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0a1c3921a3370bcf0608b79d963ae929299655ec5e3c5fe49b9b2f37b95bd29 |
|
MD5 | 915f1e7f5749254b24d11e2e9fdd684b |
|
BLAKE2b-256 | 79be6adde57d613662e9b56296574f264530b8f72d9a2756c64515ff9e2ad5e1 |
Hashes for bgen-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e243d7274a3cdee5d20a6e8cb53cde46ba6535c90a5c7a46924c94f410e08ed1 |
|
MD5 | 629dc24d9183022deb98679c16050319 |
|
BLAKE2b-256 | 31624030a16b8e4eb0ff37262d2da31ec13db79613648df0436e266a6c5823a4 |
Hashes for bgen-1.3.0-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f5708ed3729cc4782c1e94ca5cdb10e7e54b832f2cfe6455f2a3fa2f33f93ca |
|
MD5 | ec49f7266bd9aff6284a7fe696d4fade |
|
BLAKE2b-256 | 77f7d917c0c9257712c28e8865d4fc19fcf786fb7d9828dd46d8fc86a3cc0ff2 |
Hashes for bgen-1.3.0-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f95f49b81ab8079ca21fb1184f1176002a82ec65e54a62642eec3816c4a7decf |
|
MD5 | 762b84825910d25c011d29eecb45d4f6 |
|
BLAKE2b-256 | 76b6941dd647f693595ad0f95dec103682ed14a2cd679ee85ce0db8851a33a3e |
Hashes for bgen-1.3.0-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8f4e0f9c675622f8f93ec176b5abfccc24958ed6c96ea0833aa5c015f6262b7 |
|
MD5 | cd29424ebe81d7b8690ac744ed7664ff |
|
BLAKE2b-256 | 9453d9838b291cc29f473301a1b5cd166b5f86b989f99f39e3f1b81f44fec5d2 |
Hashes for bgen-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f56d3f2b3757457d6112e02901dbd5c4fbce49eb0364e8de0d62d3b417c1b50 |
|
MD5 | f8eaff295e2ea47ba069894c27320bbc |
|
BLAKE2b-256 | 99d96db65490825bd8fd684c15ffd4d608de36d53ac402021a7e8f3686fc7002 |
Hashes for bgen-1.3.0-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fd6c208fe10b68424e1af6fb2c79bd7107f4353ac4c2e5f3929932ab1d164c4 |
|
MD5 | ac77b033beea1545f4917b203a9a6186 |
|
BLAKE2b-256 | 8ba61a79668d6f3c9416f155e7edffe5023c1388d69da675ba63c539bc990959 |
Hashes for bgen-1.3.0-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b94b7fcc5033bc603cd3b6a3000fff78d9059fc12ffcd7b1f1d78e2927cb11ec |
|
MD5 | 67fedd899ba34a9a63248c1f7607e5e4 |
|
BLAKE2b-256 | 97a9716476cc8e3457eea9fee48d595cbaed3fc1c1657e567a223f084a2d35f9 |
Hashes for bgen-1.3.0-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 800332ced82192c5831259596a9975836521ce635c5317e241c6105e6f6def91 |
|
MD5 | 6e813d9a218303a1bfda0207fe35d36b |
|
BLAKE2b-256 | 42590aa84f40912a9ea1c6cfcab3ed24abc929428fc02dffc18e18629e42da13 |
Hashes for bgen-1.3.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf317b1be5743f98ce389b59d01f52491b3cd4103feece2de84ca3e721f9d584 |
|
MD5 | f252e0632a3f541d54dba7101e44b510 |
|
BLAKE2b-256 | 10fb8c644a2c0d5147252da456a514ea033a0f7ecea029fa13a2980e1cf7acd5 |
Hashes for bgen-1.3.0-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2af282c2936bd7486008b17716ffece7ae3da86e1233dea910ae928395ccd37a |
|
MD5 | 2dbcd4ee2dd3e9310d898eadaa58d9ca |
|
BLAKE2b-256 | 293f5ac271d91324e46ea5515e114a3414c1081c8e8390cec85c9f0393479d18 |