Pipeline for efficient genomic data processing.
Project description
GenVarLoader
GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.
Installation
pip install genvarloader
A PyTorch dependency is not included since it requires special instructions.
Quick Start
import genvarloader as gvl
reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'
Create readers for each file providing sequence data:
ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)
Put them together and get a torch.DataLoader
:
gvloader = gvl.GVL(
readers=varseq,
bed=regions_of_interest,
fixed_length=1000,
batch_size=16,
max_memory_gb=8,
batch_dims=['sample', 'ploid'],
shuffle=True,
num_workers=2
)
dataloader = gvloader.torch_dataloader()
And now you're ready to use the dataloader
however you need to:
# implement your training loop
for batch in dataloader:
...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
genvarloader-0.1.4.tar.gz
(29.2 kB
view details)
Built Distribution
File details
Details for the file genvarloader-0.1.4.tar.gz
.
File metadata
- Download URL: genvarloader-0.1.4.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3936b5e548ea0ab2f0e13acc0a39e821a655d27b7ced75aca0c5d83f55c9934 |
|
MD5 | 9bb5c34c25092425a8c43539e999cd63 |
|
BLAKE2b-256 | 02d52dd5bfd2f2803f09406a016ac497dabc6bd6d28f86e6ad558185d74c32b9 |
File details
Details for the file genvarloader-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: genvarloader-0.1.4-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff006370fd53d2e259ee1f151b874aa78e53484e318d94b4ad0c1bcc90bf97f4 |
|
MD5 | 4e1a038e34088d82e30ac7c1cff255f8 |
|
BLAKE2b-256 | 9cfcce18f1d0a697ef770b709c02b2aae9b1cf2e572926dbecd6f4f784967b4b |