Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.2.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.2-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.2.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.2.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2dc4955bd1e57bda36dc066cbce6975a778071bf000b988ebfe005daf5d7b793
MD5 2903dd2865277ee1d3e18ff1bfbff5bf
BLAKE2b-256 8f4e375a8672867f9f17a5133e28008cf55dce219fbaccda94f570f8ab919283

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 28a2e8c55b7d174bf28e6d7409e667069f15a03961c66d09b6531a5d972f802f
MD5 fdd522c6a38fdbf0c3c97b8ea0ca8026
BLAKE2b-256 0ca4368331768d337ef54ef6997f90d2a068c0441d1b1d30e38fd0c0c689798d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page