Skip to main content

A test python toolkit for variant site analysis

Project description

pyVMO

pyVMO (python Variant Memmap Object), a python toolkit to help you work with very large variational matrices.

Installation

pip install pyvmo

Usage

From GWAS

  1. Compress and index your raw vcf files
bgzip your_raw.vcf
tabix -p vcf your_raw.vcf.gz
  1. Convert the vcf file to vmo, which is a numpy based memmap file (memory map) that will help you read oversized matrices to get the job done in limited memory

shell

PyVMO converter -vcf2vmo your_raw.vcf.gz test.vmo

python

from pyvmo import VMO

vmo_path = "give_me_your_vmo_path"
raw_vcf_file = "your_raw.vcf.gz"

vmo = VMO(vmo_path)
vmo.store_vcf_to_vmo(raw_vcf_file)
  1. Extraction of useful submatrices by sample listing, variant site quality control

shell

PyVMO extractor sample.id.list test.vmo filter.vmo

python

# extract by sample list
spl_idx_list = vmo.get_samples_index(sample_id_list)
spl_vmo_path = "give_me_your_sample_list_filtered_vmo_path"
spl_vmo = vmo.extract_sub_vmo(spl_vmo_path, spl_idx_list=spl_idx_list)

# extract by variant site quality control
var_idx_list = spl_vmo.site_filter(maf_thr=0.05, mis_thr=0.5, chunk_size=1000, n_jobs=20)
var_vmo_path = "give_me_your_variant_site_filtered_vmo_path"
var_vmo = spl_vmo.extract_sub_vmo(var_vmo_path, var_idx_list=var_idx_list)
  1. Convert the vmo file into bimbam format

shell

PyVMO converter -vmo2bimbam filter.vmo filter.bimbam

python

bimbam_file = "give_me_your_bimbam_file_path"
var_vmo.to_bimbam(bimbam_file)
  1. get genetic distance matrix

shell

PyVMO distance filter.vmo ibs.matrix

From Other practices

  1. Get numpy array from vmo
m = var_vmo.get_matrix()
  1. Get the sample list
sample_list = var_vmo.get_sample_info()
  1. Get the variant information in a pandas dataframe
from toolbiox.lib.common.sqlite_command import pickle_load_obj, pickle_dump_obj

var_info_df = var_vmo.get_variant_info()

var_index = 1

chr_id = var_info_df.iloc[i]['CHROM']
pos = int(var_info_df.iloc[i]['POS'])
ref = var_info_df.iloc[i]['REF']
alt = pickle_load_obj(var_info_df.iloc[i]['ALT']) # alt is a list
qual = pickle_load_obj(var_info_df.iloc[i]['QUAL'])
filter = pickle_load_obj(var_info_df.iloc[i]['FILTER'])
info = pickle_load_obj(var_info_df.iloc[i]['INFO']) # info is a dict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvmo-0.0.6.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

pyVMO-0.0.6-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file pyvmo-0.0.6.tar.gz.

File metadata

  • Download URL: pyvmo-0.0.6.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyvmo-0.0.6.tar.gz
Algorithm Hash digest
SHA256 fa8bcdf1a9c97f1047d795524a1cb45f5a73cda12d9265386be355c83cc7f225
MD5 d0e4482ce7b434eff624e0c1f8bc8e13
BLAKE2b-256 ed0cd587598cabf3e71687e6741ab5e27d23837cd919276f7a3701e760d371fc

See more details on using hashes here.

File details

Details for the file pyVMO-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pyVMO-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyVMO-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 30dd455105ebdf3a2da15d24bf9d66221d6d06db1753fd1e01881cc83ebb7b53
MD5 f892e3c6e45af0e99e1527072f4e0f2c
BLAKE2b-256 5ea181dcdaf1821b38a968ccfeca9b16f8e03b6c23101b40e54fd11bc5b8658a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page