Skip to main content

Pipeline for efficient genomic data processing.

Project description

PyPI version Documentation Status Downloads PyPI - Downloads GitHub stars bioRxiv

Features

GenVarLoader provides a fast, memory efficient data structure for training sequence models on genetic variation. For example, this can be used to train a DNA language model on human genetic variation (e.g. Dalla-Torre et al.) or train sequence to function models with genetic variation (e.g. Celaj et al., Drusinsky et al., He et al., and Rastogi et al.).

  • Avoid writing any sequences to disk (can save >2,000x storage vs. writing personalized genomes with bcftools consensus)
  • Generate haplotypes up to 1,000 times faster than reading a FASTA file
  • Generate tracks up to 450 times faster than reading a BigWig
  • Supports indels and re-aligns tracks to haplotypes that have them
  • Extensible to new file formats: drop a feature request! Currently supports VCF, PGEN, and BigWig

Documentation is available here. See our preprint for benchmarking and implementation details.

Installation

pip install genvarloader

A PyTorch dependency is not included since it may require special instructions. tbb and/or pyomp are optional dependencies but highly recommended as they can improve throughput for parallelized numba code.

Contributing

  1. Clone the repo.
  2. Assuming you have Pixi, install pre-commit hooks pixi run -e dev pre-commit. If you forget to do this, your PR will likely fail to pass CI checks.
  3. Activate and use the appropriate Pixi environment for your needs. A decent catch-all is dev but you might need a different environment if using a GPU.
  4. If you are developing with osx-arm64 you will need to install plink2 manually from here in order to run the tests (needs plink2 to convert VCF -> PGEN).

All the tests are designed to use pytest (sans Rust extension code) and live under tests/. These tests ensure the code works as intended so they must all pass before any features are merged into main and subsequently released. These tests will automatically run on every PR and failing tests will block PRs from being merged.

If your PR has merge conflicts, this is usually because the main branch received updates while you've been working on it. In this case, please rebase your branch via git rebase main to resolve merge conflicts, rather than using a merge commit via git merge main.

[!NOTE] Do not edit the version number in pyproject.toml. This is handled automatically by GitHub Actions.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.22.2.tar.gz (925.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_x86_64.whl (711.9 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ x86-64

genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_i686.whl (747.6 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ i686

genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_armv7l.whl (773.9 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARMv7l

genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_aarch64.whl (674.5 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARM64

genvarloader-0.22.2-cp313-cp313t-manylinux_2_28_aarch64.whl (499.2 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (623.2 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ x86-64

genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl (519.9 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ s390x

genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (622.8 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ppc64le

genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl (532.4 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ i686

genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (496.7 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.2-cp310-abi3-win_amd64.whl (563.5 kB view details)

Uploaded CPython 3.10+Windows x86-64

genvarloader-0.22.2-cp310-abi3-musllinux_1_2_x86_64.whl (715.5 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

genvarloader-0.22.2-cp310-abi3-musllinux_1_2_i686.whl (753.1 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

genvarloader-0.22.2-cp310-abi3-musllinux_1_2_armv7l.whl (777.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARMv7l

genvarloader-0.22.2-cp310-abi3-musllinux_1_2_aarch64.whl (678.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

genvarloader-0.22.2-cp310-abi3-manylinux_2_28_aarch64.whl (502.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

genvarloader-0.22.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (625.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

genvarloader-0.22.2-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (524.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ s390x

genvarloader-0.22.2-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (626.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ppc64le

genvarloader-0.22.2-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (537.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ i686

genvarloader-0.22.2-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (500.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.2-cp310-abi3-macosx_11_0_arm64.whl (469.5 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file genvarloader-0.22.2.tar.gz.

File metadata

  • Download URL: genvarloader-0.22.2.tar.gz
  • Upload date:
  • Size: 925.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for genvarloader-0.22.2.tar.gz
Algorithm Hash digest
SHA256 a80045b966a801bdb711494dca45399e7c74724f8b2cb5675c66412a4991de22
MD5 13d3148fbec4e569b7b58272f8b5537f
BLAKE2b-256 952e5b0e1962254c12d18114eb1a43904865a4f45bc4ccdc7ecc39abb7889f4d

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 ca98fea2cce69f034bcbe0cc5b5cde42ed691fa249423e397410e760c77176f6
MD5 370145865969edc3632d6f56144d9a5b
BLAKE2b-256 dfe075b9ae43bf3513939af835ef5a378720a9fbaf74ee4476a2935fb4d72fea

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 396560b1edd24d40c9f2a78953b1b17fc543eb2ceef9d84dc0f1f098ab0049e1
MD5 6e1b33da11d57a77d9da6f55a6b657aa
BLAKE2b-256 6fa498ab6ad9882c8baa11d51f35c1b43eb4d5315bb40bdddbc7d48187fe8675

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 89153888c77e8aaea4dde97ac4e120a3b1a9d06fafc81fb5b534d8c3a46c5393
MD5 dbb99447f77f70ed3ae4dc87bc1d2282
BLAKE2b-256 bd22d273d50280eb48365c575a4dbce16ca395a4069235dfb070766b3209bc99

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 03ee8ef0e0892a30bf59b8bb01bea29431ea17db093c611f3e483cddd41ebad0
MD5 1bc788e8a0a76d8d378ac48865d25800
BLAKE2b-256 662e1bc71531c938da1b6ff811bb22a10e32546e968fa17820b3f7534dcc63a0

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cf15cf1ba1f0a5de98f6defee1873f522d97ff1763dd7badbbc6038f8880d796
MD5 a7806e0a6b60796741cc2918e3266747
BLAKE2b-256 7403328cb3698f1ede90b6acfb92967e187c664368e3fd7b72987e1ba7680a45

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2d9324909450e25e1f92732e7703a596233be9d27f20732cd98bc03b149fb685
MD5 a36c08b960458f0c743c49e55e371e49
BLAKE2b-256 2705f680835ce14a5bb1e319179112e50abae3e95c0b5457ac5df0a087063af1

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 4ffdbbde86cb4461b0a300f058bcf2f05c57f54057cf4705e06907344ac9634e
MD5 3663a49025d6f62ef4ffa267e50bb94e
BLAKE2b-256 5758279ad9f8cea32e395f6f89f8a41a217e86a0a17e43c6d1ea2f3212374363

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 ab15c6dea5ca03ac72ebb8b9ae5854783181b9e229a5e90f1498291f438a1139
MD5 b2a61ba0eb3a902613b657531ef804ca
BLAKE2b-256 b6b0138ecd791b170d95303d2bdc9f4d49290b4f5a54e7f9333180b24e2b2ac0

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 ade433a3e8459bca83df4794cc6e45c5744ce2d049a762e6ec85c1b2ce8fa23a
MD5 618c4bf4ad8298d09c633890d191094a
BLAKE2b-256 17674c5635ab87076722cc087a7ebb0f673b8a341382f464355c11382eedef09

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 6076ef76e0d6be9f9b490b23d4a53f2e210a6cea452a9b767b11a2faa0d11b28
MD5 3cd28627c2027f99a97c51784717f4a8
BLAKE2b-256 951f16335e6ad34009cec021d130e4ce89ee41a5d4ad038803bae66fe6f6b468

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5a57d5ccdec1f7ead2b3d62883adf2d48cc86b96a127e4c60f849182ebe7d938
MD5 053561db09cfbb68fd3a1120788b31db
BLAKE2b-256 e84b6b59dd21b88f237b32bfd326cee54f67e6274be63b238ccdeef23ebfa2fa

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c09c37d2d08eabdeb362e6274e2242e6124dd1650c0e5fe27881255ebd694d72
MD5 e35e016bf6eab5bcc8fc37d649bb295a
BLAKE2b-256 5d06b5c204b1cb572977cce6bb4ab00ed9474e1dd3ff6d6683dc99ae3dc311fe

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 6952f5627d5eca567ec6d1cd6b4c715b01b798804006af9f33a7ac723039f406
MD5 5eb0f88e574f4d6b1d83a6950f4c3641
BLAKE2b-256 146c1cb981f57238c0d120162d733d6d7f52c0f331f8b0c88cfe73d74c391b51

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 67f369704a5c2e755e752462734adbf0d2a3d074f4fa2aa12fbb11c410e18681
MD5 e187ce9c5917280a543c0821b2f57095
BLAKE2b-256 3895ecbd0bc17e4d4d7d8278b13899740bb46939c29a903e6795e2eb59aa7dea

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 bfb3bf430abf6e7f4ace3f2f2763d84979e61b3a5f3dab86c73a312297bcea53
MD5 eade224efc1371c7f29d09ee1035e8ef
BLAKE2b-256 f2d4161051640a91042f0330b416e0d25e2ebcd3f0cc0ca4a7c5b76157b90096

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f1f82eb674fb7fdaf7fa8a089567bd693e39c5d28821d55c1eb773493eee558a
MD5 35054098c47cbb553d3291bb40a1f5f9
BLAKE2b-256 260db38bcb585c0e902ecb726e6c3ba7a842c43a2f74b266a7b0d3c9af11ebce

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 68e664140e481a7c9bc8e9420037d1920b10d9503a4265984b64e91b8d743047
MD5 39c58257278573d387b4380ef102670a
BLAKE2b-256 581c50b10c14da7f908686ced6975d7801e7dc1de11188af6c2cafdfb63f84e4

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 097988f167aad7033d7fd8f38169da356182c51b60b1a10868a3c83443de7f10
MD5 58c04bbbe50a899872ff0baa80faa675
BLAKE2b-256 6963008a44100bdaeef2ffa81c1f2384195550cbb49784ace9a32ef71dcb34f2

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 ab4951010214823fdf91042cac5ded8ab282adb9cc09194d2d60d0afa04134b2
MD5 8c5c9fd2e7abe399aee70d79fd406c9e
BLAKE2b-256 4c00ee718c78a0fefe1024b6d1690af2f937ed5f3eb052c9e14b2c607ff81f9e

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 11291892812dba3299b842dfcbb474cf1063436b1e681831f131dedd0c3fc40b
MD5 7998e4971337df62e1d205f8bb33e391
BLAKE2b-256 57e9366f8aa561043d70aaaa77d1e4e392ec12146680491059ca0f80eb835407

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 e9da681fcd605422910fa3c689719d4497f3c6c0118ce6c9cf690091b5b581f6
MD5 f0a128f5ec36d314c7030cf59db32058
BLAKE2b-256 b16af16cc976850c81f09e77e8f36f0570bc2bf33ddfcf4d47f9a3b4543d258d

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90e9f62e51d5f4cb783e5ff384e87168bd2729c09016f72940e883751cb4f945
MD5 94657e8c1b9a334c952ae8284001547a
BLAKE2b-256 3dced43d882672757ccf505d8268b28f590715c16845812c459c17c87b7a1dc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page