bgen·PyPI

Package for loading data from bgen files

These details have not been verified by PyPI

Project links

Homepage

Project description

Another bgen parser

bgen

This is a package for reading and writing bgen files.

This package uses cython to wrap c++ code for parsing bgen files. It can parse genotypes from 500,000 individuals at ~800 variants per second within a single python process (~1.2 billion probabilities per second with a 3GHz CPU).

This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).

Install

pip install bgen

Usage

from bgen import BgenReader, BgenWriter

bfile = BgenReader(BGEN_PATH)
rsids = bfile.rsids()

# select a variant by indexing
var = bfile[1000]

# pull out genotype probabilities
probs = var.probabilities  # returns 2D numpy array
dosage = var.minor_allele_dosage  # returns 1D numpy array for biallelic variant

# iterate through every variant in the file
with BgenReader(BGEN_PATH, delay_parsing=True) as bfile:
  for var in bfile:
      dosage = var.minor_allele_dosage

# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)

# or for writing bgen files
import numpy as np
from bgen import BgenWriter

geno = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]).astype(np.float64)
with BgenWriter(BGEN_PATH, n_samples=3) as bfile:
  bfile.add_variant(varid='var1', rsid='rs1', chrom='chr1', pos=1,
                    alleles=['A', 'G'], genotypes=geno)

You can also read bgen files from stdin (to avoid local storage) e.g.

cat $BGEN_PATH | python -c '
import sys
from bgen import BgenReader
with BgenReader(sys.stdin) as bfile:
  for v in bfile:
    print(v)
'
# NOTE: if using a separate sample file, you cannot also read that from stdin,
#       you would need: with BgenReader(sys.stdin, SAMPLE_PATH) as bfile:

API documentation

class BgenReader(path, sample_path='', delay_parsing=False)
    # opens a bgen file. If a bgenix index exists for the file, the index file
    # will be opened automatically for quicker access of specific variants.
    Arguments:
      path: path to bgen file, or sys.stdin (stdin also used when path is '-' or '/dev/stdin')
      sample_path: optional path to sample file. Samples will be given integer IDs
          if sample file is not given and sample IDs not found in the bgen file
      delay_parsing: True/False option to allow for not loading all variants into
          memory when the BgenFile is opened. This can save time when iterating
          across variants in the file
  
  Attributes:
    samples: list of sample IDs
    header: BgenHeader with info about the bgen version and compression.
  
  Methods:
    slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
    iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
    fetch(chrom, start=None, stop=None): get all variants within a genomic region
    drop_variants(list[int]): drops variants by index from being used in analyses
    with_rsid(rsid): returns list of BgenVars with given rsid
    at_position(pos): returns list of BgenVars at a given position
    varids(): returns list of varids for variants in the bgen file.
    rsids(): returns list of rsids for variants in the bgen file.
    chroms(): returns list of chromosomes for variants in the bgen file.
    positions(): returns list of positions for variants in the bgen file.

class BgenVar(handle, offset, layout, compression, n_samples):
  # Note: this isn't called directly, but instead returned from BgenFile methods
  Attributes:
    varid: ID for variant
    rsid: reference SNP ID for variant
    chrom: chromosome variant is on
    pos: nucleotide position variant is at
    alleles: list of alleles for variant
    is_phased: True/False for whether variant has phased genotype data
    ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
    minor_allele: the least common allele (for biallelic variants)
    minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
    alt_dosage: 1D numpy array of alt allele dosages for each sample
    probabilities:  2D numpy array of genotype probabilities, one sample per row
      These are most likely for biallelic diploid variants. In that scenario
      unphased probabilities have three columns, for homozygous first allele 
      (AA), heterozygous (Aa), homozygous second allele (aa).
      In contrast, phased probabilities (for a biallelic diploid variant) would
      have four columns, first two for haplotype 1 (hap1-allele1, hap1-allele2), 
      last two for haplotype 2 (hap2-allele1, hap2-allele2).
  
  BgenVars can be pickled e.g. pickle.dumps(var)


class BgenWriter(path, n_samples, samples=[], compression='zstd' layout=2, metadata=None)
    # opens a bgen file to write variants to. Automatically makes a bgenix index file
    Arguments:
      path: path to write data to
      n_samples: number of samples that you have data for
      samples: list of sample IDs (same length as n_samples)
      compression: compression type: None, 'zstd', or 'zlib' (default='zstd')
      layout: bgen layout format (default=2)
      metadata: any additional metadata you want o include in the file (as str)
    
    Methods:
      add_variant_direct(variant)
        Arguments:
            variant: BgenVar, to be directly copied from one begn file to 
                another. This can be done when the new bgen file is for the same
                set of samples as the one being read from. This is much faster
                due to not having to decode and re-encode the genotype data.
      add_variant(varid, rsid, chrom, pos, alleles, genotypes, ploidy=2, 
                  phased=False, bit_depth=8)
        Arguments:
            varid: variant ID e.g. 'var1'
            rsid: reference SNP ID e.g. 'rs1'
            chrom: chromosome the variant is on e.g 'chr1'
            pos: nucleotide position of the variant e.g. 100
            alleles: list of allele strings e.g. ['A', 'C']
            genotypes: numpy array of genotype probabilities, ordered as per the
                bgen samples e.g. np.array([[0, 0, 1], [0.5, 0.5, 0]])
            ploidy: ploidy state, either as integer to indicate constant ploidy
                (e.g. 2), or numpy array of ploidy values per sample, e.g. np.array([1, 2, 2])
            phased: whether the genotypes are for phased data or not (default=False)
            bit_depth: how many bits to store each genotype as (1-32, default=8)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.9.0

Jun 27, 2025

1.8.1

May 12, 2025

1.8.0

May 9, 2025

1.7.3

Dec 3, 2024

1.7.2

Aug 20, 2024

1.7.1

Jun 25, 2024

1.7.0

Jun 25, 2024

1.6.5

May 1, 2024

1.6.4

Mar 11, 2024

1.6.3

Jan 24, 2024

1.6.2

Jan 24, 2024

1.6.1

Jan 11, 2024

1.6.0

Jan 9, 2024

1.5.9

Nov 7, 2023

1.5.8

Nov 2, 2023

1.5.7

Oct 28, 2023

1.5.6

Oct 24, 2023

1.5.5

Oct 16, 2023

1.5.4

Apr 3, 2023

1.5.3

Mar 25, 2023

1.4.1

Mar 10, 2023

1.4.0

Jan 31, 2023

1.3.1

Sep 22, 2022

1.3.0

Sep 16, 2022

1.2.17

Jun 8, 2022

1.2.16

Jun 4, 2022

1.2.15

Feb 14, 2022

1.2.14

Sep 2, 2020

1.2.13

Jul 15, 2020

1.2.12

Jul 13, 2020

1.2.10

Jul 11, 2020

1.2.9

Jul 10, 2020

1.2.8

Jul 10, 2020

1.2.7

Jul 2, 2020

1.2.6

Jun 23, 2020

1.2.4

Jun 18, 2020

1.2.3

Jun 18, 2020

1.2.2

Jun 18, 2020

1.2.1

Jun 17, 2020

1.2.0

Jun 11, 2020

1.1.10

Jun 4, 2020

1.1.9

Jun 3, 2020

1.1.8

Jun 1, 2020

1.1.7

May 28, 2020

1.1.6

May 27, 2020

1.1.5

May 21, 2020

1.1.4

May 14, 2020

1.1.3

May 14, 2020

1.1.2

May 5, 2020

1.1.1

May 5, 2020

1.1.0

May 4, 2020

1.0.0

Apr 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgen-1.9.0.tar.gz (916.4 kB view details)

Uploaded Jun 27, 2025 Source

Built Distributions

bgen-1.9.0-cp313-cp313-win_amd64.whl (514.3 kB view details)

Uploaded Jun 27, 2025 CPython 3.13Windows x86-64

bgen-1.9.0-cp313-cp313-musllinux_1_2_x86_64.whl (3.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.13musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view details)

Uploaded Jun 27, 2025 CPython 3.13manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp313-cp313-macosx_11_0_arm64.whl (937.9 kB view details)

Uploaded Jun 27, 2025 CPython 3.13macOS 11.0+ ARM64

bgen-1.9.0-cp312-cp312-win_amd64.whl (514.6 kB view details)

Uploaded Jun 27, 2025 CPython 3.12Windows x86-64

bgen-1.9.0-cp312-cp312-musllinux_1_2_x86_64.whl (3.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.12musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB view details)

Uploaded Jun 27, 2025 CPython 3.12manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp312-cp312-macosx_11_0_arm64.whl (938.7 kB view details)

Uploaded Jun 27, 2025 CPython 3.12macOS 11.0+ ARM64

bgen-1.9.0-cp311-cp311-win_amd64.whl (513.2 kB view details)

Uploaded Jun 27, 2025 CPython 3.11Windows x86-64

bgen-1.9.0-cp311-cp311-musllinux_1_2_x86_64.whl (3.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.11musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.11manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp311-cp311-macosx_11_0_arm64.whl (937.6 kB view details)

Uploaded Jun 27, 2025 CPython 3.11macOS 11.0+ ARM64

bgen-1.9.0-cp310-cp310-win_amd64.whl (513.7 kB view details)

Uploaded Jun 27, 2025 CPython 3.10Windows x86-64

bgen-1.9.0-cp310-cp310-musllinux_1_2_x86_64.whl (3.7 MB view details)

Uploaded Jun 27, 2025 CPython 3.10musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Jun 27, 2025 CPython 3.10manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp310-cp310-macosx_11_0_arm64.whl (936.3 kB view details)

Uploaded Jun 27, 2025 CPython 3.10macOS 11.0+ ARM64

bgen-1.9.0-cp39-cp39-win_amd64.whl (514.8 kB view details)

Uploaded Jun 27, 2025 CPython 3.9Windows x86-64

bgen-1.9.0-cp39-cp39-musllinux_1_2_x86_64.whl (3.7 MB view details)

Uploaded Jun 27, 2025 CPython 3.9musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded Jun 27, 2025 CPython 3.9manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp39-cp39-macosx_11_0_arm64.whl (937.8 kB view details)

Uploaded Jun 27, 2025 CPython 3.9macOS 11.0+ ARM64

bgen-1.9.0-cp38-cp38-win_amd64.whl (515.3 kB view details)

Uploaded Jun 27, 2025 CPython 3.8Windows x86-64

bgen-1.9.0-cp38-cp38-musllinux_1_2_x86_64.whl (3.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.8musllinux: musl 1.2+ x86-64

bgen-1.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded Jun 27, 2025 CPython 3.8manylinux: glibc 2.17+ x86-64

bgen-1.9.0-cp38-cp38-macosx_11_0_arm64.whl (865.2 kB view details)

Uploaded Jun 27, 2025 CPython 3.8macOS 11.0+ ARM64

File details

Details for the file bgen-1.9.0.tar.gz.

File metadata

Download URL: bgen-1.9.0.tar.gz
Upload date: Jun 27, 2025
Size: 916.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bgen-1.9.0.tar.gz
Algorithm	Hash digest
SHA256	`49554a76b3cf98d9e4cdae31a81d2c8373475ae5b916a2a4a763a5bdfb7731f3`
MD5	`8e60e4c0f3b5aec1781544552bc0e4df`
BLAKE2b-256	`ac0215118e58ae9a65660ec168ec83b77bad2949e99b7b425b734cf5920226be`

Algorithm	Hash digest
SHA256	`9e474c6b93a3bdc91e77c55d079322bfa53f62b87e3a319417734aac93be4cab`
MD5	`6af16ab2d9f47750a8fa2aa8a3853069`
BLAKE2b-256	`97d1d5c8524cc3283dd46fa815f88df769c66c14416ff24abff4a8c531b87ab4`

Algorithm	Hash digest
SHA256	`9f02408779d92325efbd7efe18829a947d318fc04d05b98e9db5a63fd603ff0a`
MD5	`b82ac69d5c7f5e157ede9756f79781f9`
BLAKE2b-256	`b9ac0674a2fd0104224853886290570d655e109113760c0b2c8daa719a5d4684`

Algorithm	Hash digest
SHA256	`52969b61d5e914d89a2c8dc20c3c090c8cb9525bb0cc58071c7e18dcb1641357`
MD5	`eab6d1a909ae980db6230786953255be`
BLAKE2b-256	`ca635b8b448ce9843b994ad8a3730ffdea2df8f09bf0be0fce910fbb80d04c7c`

Algorithm	Hash digest
SHA256	`fb182a8c06603a0777982f44df53673a465a9d29ad092dcb3e5801d3629b072c`
MD5	`a68aa0466f2afadf1d9097c835065968`
BLAKE2b-256	`357408a361df52a28b852d346f0a155226878fa9970aef1f40e588b2bd3730d9`

Algorithm	Hash digest
SHA256	`92fd341bc33e64b85fecd6a7ad1b8b55b76fc7d9277101934711f85230cd4ffa`
MD5	`f09728f5b3e465d7fe0b1b43c992917a`
BLAKE2b-256	`24857e87aea962b5c0738189dc58d5c0f478a55a75a7e120e17b992e307ef0b9`

Algorithm	Hash digest
SHA256	`8c3aaa1c0bf762dedbd8cd08f35b2851eab67a5759e0265c2ae85dac7e0c3621`
MD5	`8ede3434bf6c8a052467703685cbedc0`
BLAKE2b-256	`32d5c5f6cc653c17be9325bd81cba19bff34db890c779e0969a768e428a58686`

Algorithm	Hash digest
SHA256	`d716b8c6503a334d9a0e97b7edc24209e1a63dd3f628b11f0eeea8d5fe8753d7`
MD5	`777b792663713b25a0da7d84235f4820`
BLAKE2b-256	`cc6500621985ce3c0a37a97d889af234dbad144f318880bc95f508f0d5415724`

Algorithm	Hash digest
SHA256	`0b4049ff0983837521d0214da20ba19f7908b261412726f1402feca5b57e2143`
MD5	`cf3784d55491692586962267cf7d7222`
BLAKE2b-256	`e00624225019151499e9d4580b08910fac1e5b50f562c893453aa1233dfc9656`

Algorithm	Hash digest
SHA256	`18a805d3fa0189bef4729c296609f254640a97527128bc88b6a3d807a4a829f1`
MD5	`ae7be9aa7ce58a11c03bddf08d43f0f2`
BLAKE2b-256	`3a5ca4cbf196461b39eef9c388ead4e93bf90d3ff73135e349abf55edd2f0d3b`

Algorithm	Hash digest
SHA256	`5b049bed578e98833b6308acc7387ccf68a3616660f07e231a8de1a0eb3b430c`
MD5	`243f43a474e70b4f8f24a39f47a70e5f`
BLAKE2b-256	`da2754b864351d2a57f0bbeb802c7990318380ff4fcb83b927ca89b4ff5bbec5`

Algorithm	Hash digest
SHA256	`97d0378e3a44aacd5d20ed93592e6babc2b6bf036552fad49b98a003c566c929`
MD5	`60941a8fb022259cd2a691ca7567b9c9`
BLAKE2b-256	`be8de7b10307aa8a41e51cfe37819ed1a5805a1abec0d61b43f45891dff79051`

Algorithm	Hash digest
SHA256	`bc5f14412fd4f0adf5efe9e0a6f8142e9e31ab49ce8bc95e7bc589499bfb852d`
MD5	`8f830396f1000eb9b60862d4fb0ee538`
BLAKE2b-256	`ea41f7400c1efe5dc713bbfa13f7a7b8ca1ed18621c5e58b29b65946050c71c2`

Algorithm	Hash digest
SHA256	`681993fe33b6fd07a862db3eb864543452fb589cdf4b75541ee3c176d197c5a5`
MD5	`bd05a7835b2410a927953145026833a8`
BLAKE2b-256	`95d2a72de9087895a31f360fda8f3063deb202c067078326b53c298d5374c371`

Algorithm	Hash digest
SHA256	`8a8ecc16c1fcf293478736c5b26bdda5061acbb6819809ab9ed51cd97f2e05d1`
MD5	`523ffa002114fdfcf644d5f7b5107d37`
BLAKE2b-256	`798152d0817582d61ac927036d97b240d6aca7910417c50590b9008ab8a9243d`

Algorithm	Hash digest
SHA256	`0646a4e2c905ffc8fc0173edfa20bd7baa9fc4e8ace56bb7758b14f247972e03`
MD5	`752ec1df4142271280d5d73b99426a4a`
BLAKE2b-256	`9b4878a33197359ef04b73e3a988ef9eb0eb7c505f2d0dcff9ea1dcc7e6616b5`

Algorithm	Hash digest
SHA256	`34ff763cbbd525d4065ecd625a686fb27ff1cdef2ca961c880fc2fe0878e5a6d`
MD5	`4cebdd1be80184d6d588a96382743f80`
BLAKE2b-256	`0337388a37c21f25948c6b6463d5358086eec8097b8eae1f5fb162b6a2ec65c3`

Algorithm	Hash digest
SHA256	`2c8fcd27199549fd6543d7c836f5a7dec4470680470bbc87b0fc1a9e0ed9e8ad`
MD5	`abadd6661e45284968aee684bf566813`
BLAKE2b-256	`2e9fe313af8d1dc2b6d061db5ea84a2a12163291f3a001d13ce0e158d18d68e1`

Algorithm	Hash digest
SHA256	`76af857147d936c8e5c02ae77e74c9dbbac7767c512484edd38abb341fcaad97`
MD5	`3042ead0de0b1869f10921951eb90cae`
BLAKE2b-256	`0fed898c82d293a87443f63470c5eed5c2a94cb5e572502effcf43531c116b29`

Algorithm	Hash digest
SHA256	`5903cd533d061b43a0b10e616bd760fd8c5bbccb2cae27276695d0f95296beca`
MD5	`c186bede23bbcb98cdb44fd4d42e07fa`
BLAKE2b-256	`0c2fb7a4508f8aaa4b9e754464aed4e3f04eff00673d3a6234bcd7b8a657e8b5`

Algorithm	Hash digest
SHA256	`f2116aa3f971bc7aab96de2d966ff96060d6a1d7316f613cc79c27b87fb43d3f`
MD5	`bad99d6298b748a1fe361a650ea21851`
BLAKE2b-256	`a7c256ce9d35e4972c857c14234fa9fbc93ee6db2c64be5e7473c81f895e0603`

bgen 1.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Another bgen parser

Install

Usage

API documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

Algorithm	Hash digest
SHA256	`6097906b220b0cf455f5f3ab047abfb6ce4da2f2025241b0e199000a7669f0cf`
MD5	`5fa1cd6eab2dbad37058af6c88f1674d`
BLAKE2b-256	`8a6f45259d46f03fdb91ca8508e5e102d0466d5a876351d64bade2c97b525562`

Algorithm	Hash digest
SHA256	`2541182ed0173342ae5670dbcb73aecd05f8278a03a639723b6762456692601f`
MD5	`752cce40eda7536c998df20ca1dc16b1`
BLAKE2b-256	`ae8c87823408e6d17c51c57770edeb128057e4cb28945e6211a2bf40169d2dbd`

Algorithm	Hash digest
SHA256	`12998dcd912b4e26d510bbea2eae3d820163106f4ea2c47b3e9e9864f878eb94`
MD5	`4091a44fb9f7a4a3307a265cf7acbe96`
BLAKE2b-256	`c10b24ae44275ef31c24ad459be1e225f5ed89e7510ea0525f1e18087aac776e`

Algorithm	Hash digest
SHA256	`d23924e7faf449b961ac31f09a4ddc8edabd64d3e455d54db0217a4977e17d7c`
MD5	`b562044efa9c224975f336e011d58c19`
BLAKE2b-256	`173866742a0e33904eac73c16c21b45ba5126142c5b5e1051a387b46b1bac115`