Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.8.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.8-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.8.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.8.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.18 Linux/3.10.0-1160.49.1.el7.x86_64

File hashes

Hashes for genvarloader-0.1.8.tar.gz
Algorithm Hash digest
SHA256 ecedd294d0421d2d7bf7f5f722a7f7c53685a4dad59e4425c59b9609fb9a5d0a
MD5 61f793a07ad187ce504bd9006c34f85b
BLAKE2b-256 716d5749bcc4d940910d68bec4f962b4a4a7a14d68d9cad75e1851505bbbcb08

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.18 Linux/3.10.0-1160.49.1.el7.x86_64

File hashes

Hashes for genvarloader-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ae527b202260cbd3c1648e9e5166f83e4a5a9613c91e937fd016b77ccba64398
MD5 cc63d8dacbd13e4be169a93d3fe932d3
BLAKE2b-256 743d2cf66c86927c5446be4a3667ab882da16d0ed2c707ba4d9791b56db0af8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page