Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.3.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.3-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.3.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.3.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ec14ae22f9c74eedfeec9fdbba6f006f39609133d105164ec401bad8483baa9f
MD5 4bdaee89f2d7048978af3dd1333b210d
BLAKE2b-256 46e39d8a52a4f3867c2293b7bda2800f2d1db34c09d55f35abb9d0ad63334b9f

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 576af1ac0d3c7478699ad76adfc4e6b1a19430536ab0ba2c62dcc71f63ce1cc5
MD5 420c28b29a1210747263a890c3ba4dd2
BLAKE2b-256 96861717d547448145490d55bb85180c642b2be54a9237fc4868b9d78df0f2b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page