Skip to main content

A test python toolkit for variant site analysis

Project description

pyVMO

pyVMO (python Variant Memmap Object), a Python toolkit to help you work with huge variant matrices.

Installation

pip install pyvmo

Usage

From GWAS

  1. Compress and index your raw vcf files
bgzip your_raw.vcf
tabix -p vcf your_raw.vcf.gz
  1. Convert the vcf file to vmo, which is a numpy based memmap file (memory map) that will help you read oversized matrices to get the job done in limited memory

shell

PyVMO converter -vcf2vmo your_raw.vcf.gz test.vmo

python

from pyvmo import VMO

vmo_path = "give_me_your_vmo_path"
raw_vcf_file = "your_raw.vcf.gz"

vmo = VMO(vmo_path)
vmo.store_vcf_to_vmo(raw_vcf_file)
  1. Extraction of useful submatrices by sample listing, variant site quality control

shell

PyVMO extractor sample.id.list test.vmo filter.vmo

python

# extract by sample list
spl_idx_list = vmo.get_samples_index(sample_id_list)
spl_vmo_path = "give_me_your_sample_list_filtered_vmo_path"
spl_vmo = vmo.extract_sub_vmo(spl_vmo_path, spl_idx_list=spl_idx_list)

# extract by variant site quality control
var_idx_list = spl_vmo.site_filter(maf_thr=0.05, mis_thr=0.5, chunk_size=1000, n_jobs=20)
var_vmo_path = "give_me_your_variant_site_filtered_vmo_path"
var_vmo = spl_vmo.extract_sub_vmo(var_vmo_path, var_idx_list=var_idx_list)
  1. Convert the vmo file into bimbam format

shell

PyVMO converter -vmo2bimbam filter.vmo filter.bimbam

python

bimbam_file = "give_me_your_bimbam_file_path"
var_vmo.to_bimbam(bimbam_file)
  1. get genetic distance matrix

shell

PyVMO distance filter.vmo ibs.matrix

From Other practices

  1. Get numpy array from vmo
m = var_vmo.get_matrix()
  1. Get the sample list
sample_list = var_vmo.get_sample_info()
  1. Get the variant information in a pandas dataframe
from yxsql import pickle_load_obj, pickle_dump_obj

var_info_df = var_vmo.get_variant_info()

var_index = 1

chr_id = var_info_df.iloc[i]['CHROM']
pos = int(var_info_df.iloc[i]['POS'])
ref = var_info_df.iloc[i]['REF']
alt = pickle_load_obj(var_info_df.iloc[i]['ALT']) # alt is a list
qual = pickle_load_obj(var_info_df.iloc[i]['QUAL'])
filter = pickle_load_obj(var_info_df.iloc[i]['FILTER'])
info = pickle_load_obj(var_info_df.iloc[i]['INFO']) # info is a dict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvmo-0.0.7.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

pyVMO-0.0.7-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file pyvmo-0.0.7.tar.gz.

File metadata

  • Download URL: pyvmo-0.0.7.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for pyvmo-0.0.7.tar.gz
Algorithm Hash digest
SHA256 37db57734629e4edf9118f51438475d864b74dbb289921b766b516f316303194
MD5 3ad4cdda9db42669cfc95216f2f61113
BLAKE2b-256 81ba37af47980e33dc9cbc7610d552e8c0ed7360f94bcf43f5366b4bca13cb82

See more details on using hashes here.

File details

Details for the file pyVMO-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pyVMO-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.7

File hashes

Hashes for pyVMO-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e79c81879ba91b937a01129bdb7fe7e065aa0be02a3ccc643671a593f8512a57
MD5 614e2ea7c0d1ca4fdda560b3d5856f61
BLAKE2b-256 66fdac1b33a321100e38235174ce1959bfca25fe3b256ca91770e7e88df68d0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page