Read-only FUSE filesystem views over VCF Zarr data

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

jerome.kelleher

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Science/Research
Operating System
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering

Project description

biofuse

Read-only views of VCF Zarr (VCZ) data in standard bioinformatics file formats via a FUSE filesystem. Currently supported views:

PLINK 1.9 binary (.bed / .bim / .fam) — via mount-plink.
Oxford BGEN (.bgen / .sample / .bgen.bgi) — via mount-bgen.

The streaming file (.bed / .bgen) is generated on demand using the matching vcztools encoder; the static sidecars are computed once at mount time.

Stability and correctness

A core design principle of biofuse is that the mount must never become unresponsive. All the work of decoding VCF Zarr and encoding it into PLINK or BGEN bytes is delegated to vcztools; biofuse itself does one thing — present that data as a correct, dependable read-only filesystem. Keeping the two responsibilities separate keeps the surface biofuse has to get exactly right small.

The filesystem stays responsive under load. Encoding runs off the filesystem's request-handling path, and every read and open is bounded by a timeout: a slow or stuck encode returns a normal I/O error (EIO / EAGAIN) rather than blocking. One wedged file handle cannot freeze the others, and unmount never hangs.
Failures are contained. An error inside the encoder surfaces to the caller as an I/O error, not a crash — the mount keeps serving every other file.
The view is read-only and immutable. Writes, truncation and appends are rejected with EROFS; the sidecars are computed once when the mount starts and served unchanged for its lifetime.
POSIX behaviour is tested. A dedicated filesystem test harness (fs_tests/) exercises syscall semantics (read / pread / lseek, stat, mmap, directory listing, write rejection), cross-checks the served bytes against a reference, and runs read-stress and liveness probes that confirm the mount stays responsive while the streaming file is saturated.

Performance and access patterns

biofuse is optimised for linear, sequential reads — the access pattern used by the majority of downstream tools, which stream variants start-to-end. The streaming .bed / .bgen file is encoded on demand as the consumer reads forward, and bytes already produced are buffered, so reading straight through the file does no redundant work. The mounts are verified against plink1.9 and plink2 (--bfile, --freq, --missing, --hardy, …) for PLINK, and bgenix, qctool, REGENIE, SAIGE, BOLT-LMM and plink2 --bgen for BGEN.

Random and backward access still work, but are slower: seeking backwards or skipping far ahead can make biofuse re-encode from an earlier point in the file. The kernel page cache holds bytes that have already been served, so re-reading a region — and multi-pass tools that scan the file more than once (e.g. flashpca) — stays cheap once the data is warm.

For BGEN, the .bgen payload uses zlib level 0 (stored, fixed-size variant blocks) together with the .bgen.bgi index, so a tool can fetch an individual variant by byte range without decompressing or re-encoding the rest of the file — variant-targeted access (e.g. bgenix -v) is efficient as well as whole-file scans.

The sidecar files (.bim / .fam / .sample / .bgen.bgi) are computed once when the mount starts, so reads of them are always fast regardless of access order. These can be suppressed individually where not needed (e.g., the .bgen.bgi can be large and is not needed for many workloads).

Because the streaming file is produced on demand, a read that stalls beyond an internal timeout surfaces as EIO rather than blocking indefinitely; in practice this only appears under pathological random-access load.

Install

biofuse depends on libfuse 3 system headers (pyfuse3 builds from source):

sudo apt-get install -y fuse3 libfuse3-dev pkg-config

Then:

python -m pip install biofuse      # or: uv pip install biofuse

Remote and zipped stores

The vcz_url argument and the inherited --backend-storage / --storage-option options accept cloud, fsspec, and HTTP stores, plus .vcz.zip files. biofuse depends on bare vcztools; to mount cloud-backed stores install the matching vcztools extra, e.g. pip install 'vcztools[obstore]' or pip install 'vcztools[icechunk]'. See the vcztools documentation for the available storage backends.

Usage

`mount-plink`

biofuse mount-plink path/to/sample.vcz /mount/dir

Mounts a read-only directory at /mount/dir containing sample.bed, sample.bim, sample.fam. The mount runs in the foreground; press Ctrl-C to unmount.

Options:

--basename NAME — basename for the plink fileset (defaults to the VCZ stem).
--access-log PATH — record every read as a JSONL row to PATH (useful for characterising consumer access patterns).
The bcftools-view-style filter / backend / log options (-r/-R/-s/-S/-t/-T/-i/-e/-v/-V/-m/-M, --backend-storage, --storage-option, --log-level, --log-file) are inherited from vcztools view-plink. Run biofuse mount-plink --help or see vcztools view-plink --help for the full reference.

Example:

mkdir /tmp/plink-mnt
biofuse mount-plink ./sample.vcz /tmp/plink-mnt &
# The mount runs in the foreground, so it is backgrounded with `&`. It is
# not ready the instant the process starts — it first opens the VCZ and
# builds the sidecars — so wait for the mounted file to appear before
# running the consumer tool.
until [ -e /tmp/plink-mnt/sample.bed ]; do sleep 0.1; done
plink1.9 --bfile /tmp/plink-mnt/sample --freq --out ./out
fusermount3 -u /tmp/plink-mnt

`mount-bgen`

biofuse mount-bgen path/to/sample.vcz /mount/dir

Mounts a read-only directory at /mount/dir containing sample.bgen, sample.sample, sample.bgen.bgi. The .bgen payload uses zlib level 0 (stored, fixed-size variant blocks) so byte-range random access is O(1); downstream tools (bgenix, qctool, REGENIE, SAIGE, BOLT-LMM, plink2 --bgen) consume the mount unchanged. The .bgen.bgi SQLite sidecar and .sample are generated once at mount time.

Options mirror mount-plink: --basename, --access-log, and the shared bcftools-style filter / backend / log set inherited from vcztools view-bgen. Run biofuse mount-bgen --help or see vcztools view-bgen --help for the full reference.

Example:

mkdir /tmp/bgen-mnt
biofuse mount-bgen ./sample.vcz /tmp/bgen-mnt &
# Wait for the mount to come up before reading from it (see mount-plink above).
until [ -e /tmp/bgen-mnt/sample.bgen ]; do sleep 0.1; done
bgenix -g /tmp/bgen-mnt/sample.bgen -list
fusermount3 -u /tmp/bgen-mnt

Limitations: ploidy

Mixed ploidy is not supported by mount-bgen. The fixed-size BGEN encoder used for random-access serving requires uniform ploidy across every sample and variant in the view. Mounts whose region includes mixed-ploidy chromosomes (typically X, Y, MT) open successfully and serve .sample and .bgen.bgi, but the first .bgen read will fail with EIO. Workaround: restrict the view to autosomes at mount time (e.g. via the inherited -r / -R / -t / -T region filters), or use the one-shot vcztools view-bgen CLI for full-file conversions that include X / Y / MT — view-bgen uses the streaming variable-size encoder which handles mixed ploidy correctly.
Pure haploid VCZ is supported by mount-bgen (the encoder emits a uniform-haploid BGEN payload).
mount-plink is diploid-only. Pure haploid VCZ inputs (e.g. mitochondrial-only stores) are rejected by the underlying encoder with EIO on the first .bed read. Mixed-ploidy VCZ inputs serve successfully, but haploid samples are encoded as homozygous for the called allele — this matches the PLINK 1 BED format, which has no haploid representation.

Development

uv sync --group dev
uv run pytest                          # full suite
uv run pytest tests/test_encoder_ops.py  # one module
uv run prek install                    # install git pre-commit hook (one-off)
uv run --only-group=lint prek -c prek.toml run --all-files

Project details

These details have been verified by PyPI

Project links

repository

GitHub Statistics

Maintainers

jerome.kelleher

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Science/Research
Operating System
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

This version

0.1.0

Jun 8, 2026

0.1.0a1 pre-release

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biofuse-0.1.0.tar.gz (181.4 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

biofuse-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file biofuse-0.1.0.tar.gz.

File metadata

Download URL: biofuse-0.1.0.tar.gz
Upload date: Jun 8, 2026
Size: 181.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for biofuse-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d79a69c886c25acb90c5afc29a0a2211295e237091c7c89b879a7aecaee1d6fa`
MD5	`eda29b81fc4e6eba81f8476116ba58e0`
BLAKE2b-256	`bcc72bed2661f0007cf34ae749c42f624d9eb696d71ab2687d335839f45bb4e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for biofuse-0.1.0.tar.gz:

Publisher: cd.yml on sgkit-dev/biofuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: biofuse-0.1.0.tar.gz
- Subject digest: d79a69c886c25acb90c5afc29a0a2211295e237091c7c89b879a7aecaee1d6fa
- Sigstore transparency entry: 1756147360
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: sgkit-dev/biofuse@27b00ddcc5ad470a1b2d5ae7b508d6cbe21e6094
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/sgkit-dev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@27b00ddcc5ad470a1b2d5ae7b508d6cbe21e6094
- Trigger Event: release

File details

Details for the file biofuse-0.1.0-py3-none-any.whl.

File metadata

Download URL: biofuse-0.1.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for biofuse-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8443755e814b81a8a398c512008826f0dd4e4f6ec24b2db75a3b3d511fc5b708`
MD5	`5d984d0738d6763f0ab1357a9e0ff431`
BLAKE2b-256	`a1ada3b64fd190ca0ea545f185b72a58a6840719a1d6dffa1b737086059c126f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for biofuse-0.1.0-py3-none-any.whl:

Publisher: cd.yml on sgkit-dev/biofuse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: biofuse-0.1.0-py3-none-any.whl
- Subject digest: 8443755e814b81a8a398c512008826f0dd4e4f6ec24b2db75a3b3d511fc5b708
- Sigstore transparency entry: 1756147371
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: sgkit-dev/biofuse@27b00ddcc5ad470a1b2d5ae7b508d6cbe21e6094
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/sgkit-dev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@27b00ddcc5ad470a1b2d5ae7b508d6cbe21e6094
- Trigger Event: release

biofuse 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

biofuse

Stability and correctness

Performance and access patterns

Install

Remote and zipped stores

Usage

`mount-plink`

`mount-bgen`

Limitations: ploidy

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance