Skip to main content

Pipeline for efficient genomic data processing.

Project description

GenVarLoader

GenVarLoader aims to enable training sequence models on 10's to 100's of thousands of individuals' personalized genomes.

Installation

pip install genvarloader

A PyTorch dependency is not included since it requires special instructions.

Quick Start

import genvarloader as gvl

reference = 'reference.fasta'
variants = 'variants.pgen' # highly recommended to convert VCFs to PGEN
regions_of_interest = 'regions.bed'

Create readers for each file providing sequence data:

ref = gvl.Fasta(name='ref', path=reference, pad='N')
var = gvl.Pgen(variants)
varseq = gvl.FastaVariants(name='varseq', fasta=ref, variants=var)

Put them together and get a torch.DataLoader:

gvloader = gvl.GVL(
    readers=varseq,
    bed=regions_of_interest,
    fixed_length=1000,
    batch_size=16,
    max_memory_gb=8,
    batch_dims=['sample', 'ploid'],
    shuffle=True,
    num_workers=2
)

dataloader = gvloader.torch_dataloader()

And now you're ready to use the dataloader however you need to:

# implement your training loop
for batch in dataloader:
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genvarloader-0.1.4.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

genvarloader-0.1.4-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file genvarloader-0.1.4.tar.gz.

File metadata

  • Download URL: genvarloader-0.1.4.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e3936b5e548ea0ab2f0e13acc0a39e821a655d27b7ced75aca0c5d83f55c9934
MD5 9bb5c34c25092425a8c43539e999cd63
BLAKE2b-256 02d52dd5bfd2f2803f09406a016ac497dabc6bd6d28f86e6ad558185d74c32b9

See more details on using hashes here.

File details

Details for the file genvarloader-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: genvarloader-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.8 Linux/4.18.0-477.21.1.el8_8.x86_64

File hashes

Hashes for genvarloader-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ff006370fd53d2e259ee1f151b874aa78e53484e318d94b4ad0c1bcc90bf97f4
MD5 4e1a038e34088d82e30ac7c1cff255f8
BLAKE2b-256 9cfcce18f1d0a697ef770b709c02b2aae9b1cf2e572926dbecd6f4f784967b4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page