Skip to main content

Pipeline for efficient genomic data processing.

Project description

PyPI version Documentation Status Downloads PyPI - Downloads GitHub stars bioRxiv

Features

GenVarLoader provides a fast, memory efficient data structure for training sequence models on genetic variation. For example, this can be used to train a DNA language model on human genetic variation (e.g. Dalla-Torre et al.) or train sequence to function models with genetic variation (e.g. Celaj et al., Drusinsky et al., He et al., and Rastogi et al.).

  • Avoid writing any sequences to disk (can save >2,000x storage vs. writing personalized genomes with bcftools consensus)
  • Generate haplotypes up to 1,000 times faster than reading a FASTA file
  • Generate tracks up to 450 times faster than reading a BigWig
  • Supports indels and re-aligns tracks to haplotypes that have them
  • Extensible to new file formats: drop a feature request! Currently supports VCF, PGEN, and BigWig

Documentation is available here. See our preprint for benchmarking and implementation details.

Installation

pip install genvarloader

A PyTorch dependency is not included since it may require special instructions.

Contributing

  1. Clone the repo.
  2. Assuming you have Pixi, install pre-commit hooks pixi run -e dev pre-commit. If you forget to do this, your PR will likely fail to pass CI checks.
  3. Activate and use the appropriate Pixi environment for your needs. A decent catch-all is dev but you might need a different environment if using a GPU.
  4. If you are developing with osx-arm64 you will need to install plink2 manually from here in order to run the tests (needs plink2 to convert VCF -> PGEN).

All the tests are designed to use pytest (sans Rust extension code) and live under tests/. These tests ensure the code works as intended so they must all pass before any features are merged into main and subsequently released. These tests will automatically run on every PR and failing tests will block PRs from being merged.

If your PR has merge conflicts, this is usually because the main branch received updates while you've been working on it. In this case, please rebase your branch via git rebase main to resolve merge conflicts, rather than using a merge commit via git merge main.

[!NOTE] Do not edit the version number in pyproject.toml. This is handled automatically by GitHub Actions.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.22.0.tar.gz (928.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_x86_64.whl (711.8 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ x86-64

genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_i686.whl (747.5 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ i686

genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_armv7l.whl (773.8 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARMv7l

genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_aarch64.whl (674.4 kB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARM64

genvarloader-0.22.0-cp313-cp313t-manylinux_2_28_aarch64.whl (499.1 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (623.0 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ x86-64

genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl (519.8 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ s390x

genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (622.7 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ppc64le

genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl (532.3 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ i686

genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (496.6 kB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.0-cp310-abi3-musllinux_1_2_x86_64.whl (715.4 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

genvarloader-0.22.0-cp310-abi3-musllinux_1_2_i686.whl (752.9 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

genvarloader-0.22.0-cp310-abi3-musllinux_1_2_armv7l.whl (777.8 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARMv7l

genvarloader-0.22.0-cp310-abi3-musllinux_1_2_aarch64.whl (678.7 kB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

genvarloader-0.22.0-cp310-abi3-manylinux_2_28_aarch64.whl (502.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

genvarloader-0.22.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (625.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

genvarloader-0.22.0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (524.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ s390x

genvarloader-0.22.0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (626.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ppc64le

genvarloader-0.22.0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (537.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ i686

genvarloader-0.22.0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (499.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARMv7l

genvarloader-0.22.0-cp310-abi3-macosx_11_0_arm64.whl (472.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file genvarloader-0.22.0.tar.gz.

File metadata

  • Download URL: genvarloader-0.22.0.tar.gz
  • Upload date:
  • Size: 928.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for genvarloader-0.22.0.tar.gz
Algorithm Hash digest
SHA256 773d0ba3c8d53afcb155466b5d9980276493383fe7568a08f4fbf0e7a4cf0ae5
MD5 8053e7c9666dd1cdbf66b47500198e55
BLAKE2b-256 2dbae6cc10f8f8edf2294eccae8aa8da14dd0160228b257c32a74981bfd7ad4c

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 331e2a60867fdea8938517c524b7e4b14dd6198a9b794074ca98c84f73ed84f1
MD5 a200ef4f5539f514fcde7e59522b1278
BLAKE2b-256 4cfb52bdb02ead683a2b2782b767861dcf62157d29a808ff6f5176108be89faa

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 7181788d2c35cc2a5eb36f3f4a3402e498fab26b7768943d630199d346ac0fd1
MD5 d05689c8ef9f5ec6929ad830cca22252
BLAKE2b-256 70d6c236cf29e1a1caa460e000c2823898319ec1dd53add6427b7885c557bfde

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 13682a5abb9dd86f79a6e8d43b744059671df2c205f0382c2d0bfc3ba6280129
MD5 d78b46944c3accc6d6e7d8f330e1cc46
BLAKE2b-256 d8d61d85fb0fd3f796a1f17266e105c8c7be12d379faaa100de06ff75e9ff417

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 08d90854e1f06358f3d7bd2fd72a69465e34ff0c7f12d528526097879fcb7225
MD5 c74a52e3f2ce7a83dae5c900d434fcf8
BLAKE2b-256 96e97dff2f6d93b9f05299772986f6f974c42097254ca1f55985b539df507c3d

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b4d758bff31956b4b394eb578d9d9d5c646c6f332f51f438b46fb8f34266b32b
MD5 e260bcd8cb008e06083ef86622934630
BLAKE2b-256 a3f8581ee5395398854367c9e277f604b459437da9d67f26325dd3d355c84794

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d9f1ba12b8aa4bc38255a0573d3ded5bf61c34f1eedf3776ed30b3aa08a9995d
MD5 5e2f857ab1c88f33e9d32bed9209c49d
BLAKE2b-256 e83a771ca862826419b4f22c70905a6f249e6b19f26d3f9cf6fb744babe6f95b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 b87ad27f8796ca1d4a7e1a19c8e16c368f437091a7cc3f02b8e52c807bec633e
MD5 d2828b7dd45ba42a89185b28f8ab77d5
BLAKE2b-256 846132902e898a485be2bbad2eadefd9d1795a0350a3e481db094b4ce41d4832

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 d11a05a107621c62be8d427d5de356b767ff1f469846f8c1f2fce912a19c6e88
MD5 0c84d60189b5400e06c3cf196068aed8
BLAKE2b-256 335a24988412aa914dc3f779b457090b444584ff66930a47170c164de06c070d

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9b54e3973b35a9e298ecab3e8e05ee17bf2412a3294c3d7c9358afc4f00060af
MD5 18122e7f9cde74c876644a562356a0cf
BLAKE2b-256 7963fbd0b50b62035204913b6b8aaf965665b0cff0e6a7688dbf22c8663347c4

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 2fb1389f90bdf50fe8778ec655a4aac8b20552c6c7efba5678aadf5179d8037c
MD5 d8cc449c74dba011ac17fa829de5c49d
BLAKE2b-256 4a0f55c38b7387d337d544deb54dac8b01f486a914a07a7385e960ff747965fa

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2a12ea34ff7ca9d15340f1d938da769c6d5853f80ef6425c459da60eba5e062c
MD5 a414f71f0db27c78ee3757218be09d74
BLAKE2b-256 175c529ea01b24bbc835f4a096e0c2a21a27273f8d48a2e04448180952a15405

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 153e97bb5cefebdee8709a1f0e163732780e7358a0958b6fd8878883a23855eb
MD5 a093253995a9d9e1629cb68612549ba2
BLAKE2b-256 653bbf2afa37a00debb714244fa6ab250dea7d02bc4146c454683facad423a78

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 7b0d304a050c37c1b1770ce4128c6ccfc70531c68ac6f33c001425d720790943
MD5 3b7aa4cb2f1dc3eaf6c0d1a4300e0057
BLAKE2b-256 0a2e40acd193e289eed5feee48c3f8137a4c6a2b8ef4bddc130db63fa67c7be7

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 21de705b6fd961b4c7ead029361bcd092914aa3e1379b783cce69031c74f10c9
MD5 13baffdd12a1fd0e1f58945a1ace2fec
BLAKE2b-256 aa8b3147d06136a19e6407f4be75d5e6bab7983ed5e16d016f528b19d2fcb78e

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e57fbe6f124fbe6c90679a65a1a8211259ac6632204fe876849a38b8acb4ca27
MD5 f223d4dafc62f8f77f20c660046b1603
BLAKE2b-256 105ef69dd349eac625106050921b21d5a9648288d3239c341a2613b500882332

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e2200ed1dcbac7c2fdc9bef6da2bbfbe482079510f8f6d44ccc9fbe7df3a59d4
MD5 38bf25691c2c6d3e2f72816d375e0685
BLAKE2b-256 1cdd72618f2b9cbb1bac212641b43e9afaceeed203d56a8899242958e6f2503a

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 df45e2b3e6940d15e72f2d3c186663680f76aee4d30f8f119b335d354ff8ee57
MD5 de073c467f373744c71c5d6016f9fde0
BLAKE2b-256 ed0cf194de11e8945979c4e75fe891f16c5fd4e05707a68e1919d049de52d87d

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 5794432f34a6a17cdfa8212277301f19827faf0cb2289b0ee8234742a2d63af1
MD5 09806269fdd866c49e70c6e7857a7b93
BLAKE2b-256 71d566d84d6807d65305ca64f0faf2ea88a256272030d4181233a2aaa5dfc43b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9158de9a0e74c6c3221799dbee8a51f159517902c8b0caa6ba53474be00c57ea
MD5 d8f72325f279ef92f88fa4f587f275d9
BLAKE2b-256 c4e47e52ef86979bafb44d96ece35293fe2fe7cd871b6c2e9c6dffdd6fd93d2b

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 87f38f01113be3ea34ee4f919be734ac4a6de703415fcf8a92d3c21e0d6b2c9d
MD5 e6a92c1aa3e8c60cb8dbd6e0dfa630b9
BLAKE2b-256 a6f68367be1cc3348f4ee4e004f6316ee9fa45fc44bace553760b31782c7eb6a

See more details on using hashes here.

File details

Details for the file genvarloader-0.22.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for genvarloader-0.22.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d2725384bd3ca4eef6732ccf9eb071633ab2d5118d03c81e893a238996e94594
MD5 8c218309c85246f85b824c225133e147
BLAKE2b-256 2cbaca7932c95eba5f7517debe369c6c1a67a8a34da08a29f2bf49ae0de3826d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page