Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.1-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.1.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7e4e4407ca0bb60f8164aa4351153f35f4bd1c7c1015c6f0828cbdad473ecceb
MD5 e38efdca500cd13128d5d66e2591034b
BLAKE2b-256 af1b12c49f2c8c9301d5511efbdd4db7afb5741e9e3c8cd978ac83b8db6c5139

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cdb7b2b6e72c2a1dadeaa764fd6de396eebf20df188869b48b974af9cc317477
MD5 09db5c3875bf7845f5f7a786e808ca86
BLAKE2b-256 f3495fe1b4a0522680f97d1c29866079fed0336d9699326c73e7d1c31e0a0716

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page