Skip to main content

Pipeline for efficient genomic data processing.

Project description

PyPI version Documentation Status Downloads PyPI - Downloads GitHub stars bioRxiv

Features

GenVarLoader provides a fast, memory efficient data structure for training sequence models on genetic variation. For example, this can be used to train a DNA language model on human genetic variation (e.g. Dalla-Torre et al.) or train sequence to function models with genetic variation (e.g. Celaj et al., Drusinsky et al., He et al., and Rastogi et al.).

  • Avoid writing any sequences to disk (can save >2,000x storage vs. writing personalized genomes with bcftools consensus)
  • Generate haplotypes up to 1,000 times faster than reading a FASTA file
  • Generate tracks up to 450 times faster than reading a BigWig
  • Supports indels and re-aligns tracks to haplotypes that have them
  • Extensible to new file formats: drop a feature request! Currently supports VCF, PGEN, and BigWig

Documentation is available here. See our preprint for benchmarking and implementation details.

Installation

pip install genvarloader

A PyTorch dependency is not included since it may require special instructions.

Contributing

  1. Clone the repo.
  2. Assuming you have Pixi, install pre-commit hooks pixi run -e dev pre-commit. If you forget to do this, your PR will likely fail to pass CI checks.
  3. Activate and use the appropriate Pixi environment for your needs. A decent catch-all is dev but you might need a different environment if using a GPU.
  4. If you are developing with osx-arm64 you will need to install plink2 manually from here in order to run the tests (needs plink2 to convert VCF -> PGEN).

All the tests are designed to use pytest (sans Rust extension code) and live under tests/. These tests ensure the code works as intended so they must all pass before any features are merged into main and subsequently released. These tests will automatically run on every PR and failing tests will block PRs from being merged.

If your PR has merge conflicts, this is usually because the main branch received updates while you've been working on it. In this case, please rebase your branch via git rebase main to resolve merge conflicts, rather than using a merge commit via git merge main.

[!NOTE] Do not edit the version number in pyproject.toml. This is handled automatically by GitHub Actions.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.22.1.tar.gz (930.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_x86_64.whl (711.8 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ x86-64

genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_i686.whl (747.6 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ i686

genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_armv7l.whl (773.9 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARMv7l

genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_aarch64.whl (674.4 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARM64

genvarloader-0.22.1-cp313-cp313t-manylinux_2_28_aarch64.whl (499.1 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (623.1 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ x86-64

genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl (519.8 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ s390x

genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (622.8 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ppc64le

genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl (532.3 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ i686

genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (496.6 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.1-cp310-abi3-musllinux_1_2_x86_64.whl (715.5 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

genvarloader-0.22.1-cp310-abi3-musllinux_1_2_i686.whl (753.0 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

genvarloader-0.22.1-cp310-abi3-musllinux_1_2_armv7l.whl (777.8 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARMv7l

genvarloader-0.22.1-cp310-abi3-musllinux_1_2_aarch64.whl (678.8 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

genvarloader-0.22.1-cp310-abi3-manylinux_2_28_aarch64.whl (502.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

genvarloader-0.22.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (625.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

genvarloader-0.22.1-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (524.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ s390x

genvarloader-0.22.1-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (626.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ppc64le

genvarloader-0.22.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (537.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ i686

genvarloader-0.22.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (500.0 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.1-cp310-abi3-macosx_11_0_arm64.whl (472.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file genvarloader-0.22.1.tar.gz.

File metadata

  • Download URL: genvarloader-0.22.1.tar.gz
  • Upload date:
  • Size: 930.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for genvarloader-0.22.1.tar.gz
Algorithm Hash digest
SHA256 961f5e11a7b592a4742d714efabfc921b28b45ba1ed490351e98c82c7dbad6ad
MD5 a16c063a2f93dde2ecd2070b3f34fc2d
BLAKE2b-256 afa109992ef3542cfc5eae1b0bf2ff992dea67e302d3c3aeb5728527b9b8c33e

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0708d038512d251aee6b3cea88c452421f278fa16ca9bbcf533eb61c34496164
MD5 fd6a57fd84749556b908259124237f28
BLAKE2b-256 623566e48af2aa22b07af9dc9c1b0d986c0d96a3948e74fed46e04272b14c11b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 3c113c317fd6bcd21130ab15d614bb459590e83314edb7de4e766ad5a80a8adf
MD5 80d24ac2a6c592224e0b918d4b901d2b
BLAKE2b-256 594f93f8a91b943465036f5bc515b922721bdfca078fd478f80358b310140c25

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 2e928ce3bb94952b11341e6f0f6807284be7fbab2a5f99870032700145dac9cb
MD5 32c969635dd010504afdd27dad952005
BLAKE2b-256 f91c9f881f4455fe82648ecadfe9161f193f1bbe7379515c1fe854f01b5ca55b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 1bb5ac723ea6aa154b2c6f356030b888bf3b8fb0c319088cfd1b69cdd6a835c0
MD5 f00ba105026c88a3250c74ed6362dfee
BLAKE2b-256 7f993db64475da080b62f0aa2ae0a4cd3403cb6c433f94949a8e99cc6f689a6f

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b559b026f973daf92433f6d48cebbe0e528e3ec0852f44740118e521c8e35cb1
MD5 c8c6da5cb3d95cd1f677a5a75b1ba161
BLAKE2b-256 e4a494c8efd4c31b9de8afd634c8171a8c6b26fc8c4018c991aa85aff7fb5a04

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e0b14519bf73b51d027d92f157b8a64b54614df2f009f1a9bc3a3648d675244a
MD5 a3266f654811967361f58f23bb3d5215
BLAKE2b-256 92b67bc19d08619aee33ae1abb8bf6bfe13dc2fa012dc1019a000c2e7d0bbad5

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 c0dc11264c503ad057200d3da5748be68fd6b274127181973a1bbd0e2d26e76c
MD5 7488da02ae9297bcad9e41f32429e515
BLAKE2b-256 be655385cce67d5c01eb3b72f300d835919547131f414131e5a036bd4a1b5793

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 6c96b857bcbc3f77ef39b5746195c6100e886d04d114d9c36c1e05837a1f80ef
MD5 019058d137f49d38d85d3e707d55d002
BLAKE2b-256 240242b7bc87c8ec7eb40b0d4d4ad42e1578fe421e222bf96fade23c3f6e027f

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9fbcc9ccebb34b378a48a76afa3586906a79e36298cc9bea43c83586061213dc
MD5 a464ea586fb8ebb5a6099fe51f46d2c8
BLAKE2b-256 89a927876a9edf9ebd36363bb56c4302e2be867dc45c5cc506efb3367d66b0b7

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 7afe673af4bc9ff131200f7d8e26c0ca2629294cf5dcb1a1c5e0a900fe1439d0
MD5 c4f21546258ccb6c6f22e2fa4c4ea45c
BLAKE2b-256 ca546406269bc275e6b71e0e8b504629219df992cd30d3082375c67b67d49b52

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 00221389584fe0e07eceba03104663e8101cb390766803477893b893af41ab90
MD5 2907f908d452fa6dff3886f8c4a303b2
BLAKE2b-256 9750f677daf24c2ccbdbe97e6dfce12baee382d3421c16e91cc800f4343b7595

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 7125e21398d6581fb2032da284a0c0c0f36198cfd89ea1a0634c19494a1e0ef3
MD5 2a184f69edf6c4fc1fa6ccf19575a58b
BLAKE2b-256 648e121ffb223d49c92653be560bf7a438ac8649380f8c5b8c797f5793be0098

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 bc74792866d1b60118a73c53d065ca429c17aa12267fb54b02a82455a9b65e69
MD5 fe59335fe2d04dc73ccff69c83afc531
BLAKE2b-256 63b6753cbe6c8d12d6198564a745c6219a9378c06b11ffc8b3208b9c1a2f5913

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 6321d59ecf4a2f1e06f657847b632a3931b4f54c10db033eece95ac8d822c2d7
MD5 a5b7e8be0113001ba4aa9c21737e95de
BLAKE2b-256 d2f212a293b96884df39781185cf8020d9a30ac3d5c576aa542acedee7207d81

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 77bcbe2dc9afebac8ce6fd04a6815d51330c55d6d34bcea68445b0f7fb8b00f8
MD5 6afa702dca10dff6128f6af7ec17108b
BLAKE2b-256 2f7801447f78d07a76b06d7c7ef3fb96b75933a8a0827cfcafc742d8e3a15555

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9944f4e395ea2d01f15860bdd46a876c240c062654047a7e046bed3b17018269
MD5 9688a0b944a84b03034154bdf9836d5d
BLAKE2b-256 5946b04ebca5990c31f6e4e2c03de9aac9b7f60fcae352772ff2458e4a32f4ee

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 3991de5da4fce65a196b5d244b885e5f59f637082d0a258e39640295796fbd50
MD5 32d6f0c3b10b0c1db16737a833f006f2
BLAKE2b-256 17b9743834a7721563c652154be0ca233e23d5a35d934df4746110d3edddeeb2

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 4c2c08934d98e23c8830352f69a7a86e11f664a727164e4f9706de89b6cf0267
MD5 737c2c154b1f76c5770b9f54d17ee2a1
BLAKE2b-256 29f5dfe3b33a95fc61e57be86e823f4b83c1ce070374bfbebacdbf0b37fd700b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 6623a6db76df7a2b620cf9c328cd5326b076b234b2207fe846febd980ee9d7dd
MD5 c792794bac7bb33b4da14a487e897f9b
BLAKE2b-256 a7ce21b0a6a2cbdaf7faf49be445d30cdf638973a46d256d8095f025b110c0c3

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 93913a98250b99fdac33de200d3c7ad25b9e0f36aea7c974493a1e6b32919a80
MD5 0a5c848175b3dd4618a60005f5d46108
BLAKE2b-256 e0de8a9a1a8c59ae6da091be7492f8553ac77797badffca90c9c332ecfefeae3

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 de935c3807862d066abffaa92d8eb500609f4875483c9a2d7d9b560f44529652
MD5 ecaf8d80746adf9451c387e25d310e5a
BLAKE2b-256 7c457567d30568d97c51f357fc96a14ea77a762eddeaeecb44cd6644ea490fcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page