Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.7.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.7-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.7.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.7.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.7.tar.gz
Algorithm Hash digest
SHA256 bf95042d5fcafde54d0e929ab45e69fbea1c4b60852051f9cdf3f9e963245a83
MD5 5c0e16993dd9275e061cbdc69abec192
BLAKE2b-256 1b067ea01dc4daa4a5b3839895d8a02e2b9b84de83782876952aedc01bc94cf9

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ead8166e61a0ebbee7d078be662b9376f386ac621a144069f61430a29ab096b5
MD5 c57dcc77a9c8d7df3fa9aa773fc7688c
BLAKE2b-256 b326ffcda859d29941c8ca168e9dea03fc962df8c8b96d4d3b31f2f1eb196cf4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page