Skip to main content

A test python toolkit for variant site analysis

Project description

pyVMO

pyVMO (python Variant Memmap Object), a python toolkit to help you work with very large variational matrices.

Installation

pip install pyvmo

Usage

From GWAS

  1. Compress and index your raw vcf files
bgzip your_raw.vcf
tabix -p vcf your_raw.vcf.gz
  1. Convert the vcf file to vmo, which is a numpy based memmap file (memory map) that will help you read oversized matrices to get the job done in limited memory

shell

PyVMO converter -vcf2vmo your_raw.vcf.gz test.vmo

python

from pyvmo import VMO

vmo_path = "give_me_your_vmo_path"
raw_vcf_file = "your_raw.vcf.gz"

vmo = VMO(vmo_path)
vmo.store_vcf_to_vmo(raw_vcf_file)
  1. Extraction of useful submatrices by sample listing, variant site quality control

shell

PyVMO extractor sample.id.list test.vmo filter.vmo

python

# extract by sample list
spl_idx_list = vmo.get_samples_index(sample_id_list)
spl_vmo_path = "give_me_your_sample_list_filtered_vmo_path"
spl_vmo = vmo.extract_sub_vmo(spl_vmo_path, spl_idx_list=spl_idx_list)

# extract by variant site quality control
var_idx_list = spl_vmo.site_filter(maf_thr=0.05, mis_thr=0.5, chunk_size=1000, n_jobs=20)
var_vmo_path = "give_me_your_variant_site_filtered_vmo_path"
var_vmo = spl_vmo.extract_sub_vmo(var_vmo_path, var_idx_list=var_idx_list)
  1. Convert the vmo file into bimbam format

shell

PyVMO converter -vmo2bimbam filter.vmo filter.bimbam

python

bimbam_file = "give_me_your_bimbam_file_path"
var_vmo.to_bimbam(bimbam_file)
  1. get genetic distance matrix

shell

PyVMO distance filter.vmo ibs.matrix

From Other practices

  1. Get numpy array from vmo
m = var_vmo.get_matrix()
  1. Get the sample list
sample_list = var_vmo.get_sample_info()
  1. Get the variant information in a pandas dataframe
from toolbiox.lib.common.sqlite_command import pickle_load_obj, pickle_dump_obj

var_info_df = var_vmo.get_variant_info()

var_index = 1

chr_id = var_info_df.iloc[i]['CHROM']
pos = int(var_info_df.iloc[i]['POS'])
ref = var_info_df.iloc[i]['REF']
alt = pickle_load_obj(var_info_df.iloc[i]['ALT']) # alt is a list
qual = pickle_load_obj(var_info_df.iloc[i]['QUAL'])
filter = pickle_load_obj(var_info_df.iloc[i]['FILTER'])
info = pickle_load_obj(var_info_df.iloc[i]['INFO']) # info is a dict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvmo-0.0.5.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

pyVMO-0.0.5-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file pyvmo-0.0.5.tar.gz.

File metadata

  • Download URL: pyvmo-0.0.5.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyvmo-0.0.5.tar.gz
Algorithm Hash digest
SHA256 67f215708205886c925bb1d2e339f5f19cc02704f1355ae730e199460b7e380b
MD5 044ba87538340b5fe2febe0f6e1cd6c6
BLAKE2b-256 8c9a2c7636068c430512c983a218c1c6e8be5f498c8c5c1965f8c55ed7b67db8

See more details on using hashes here.

File details

Details for the file pyVMO-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: pyVMO-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyVMO-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a739950b5cb6428018142c167d9b4dbbf3ee7f2ba00ecc21b953079c13bc58c8
MD5 0d31cf0a5603f08988380e1c7f4d8914
BLAKE2b-256 3d11f2a9c527d8d9d61a6bbd78911243ac9142ccf05264e0b7207713d90a628f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page