Skip to main content

A test python toolkit for variant site analysis

Project description

pyVMO

pyVMO (python Variant Memmap Object), a python toolkit to help you work with very large variational matrices.

Installation

pip install pyvmo

Usage

From GWAS

  1. Compress and index your raw vcf files
bgzip your_raw.vcf
tabix -p vcf your_raw.vcf.gz
  1. Convert the vcf file to vmo, which is a numpy based memmap file (memory map) that will help you read oversized matrices to get the job done in limited memory

shell

PyVMO converter -vcf2vmo your_raw.vcf.gz test.vmo

python

from pyvmo import VMO

vmo_path = "give_me_your_vmo_path"
raw_vcf_file = "your_raw.vcf.gz"

vmo = VMO(vmo_path)
vmo.store_vcf_to_vmo(raw_vcf_file)
  1. Extraction of useful submatrices by sample listing, variant site quality control

shell

PyVMO extractor sample.id.list test.vmo filter.vmo

python

# extract by sample list
spl_idx_list = vmo.get_samples_index(sample_id_list)
spl_vmo_path = "give_me_your_sample_list_filtered_vmo_path"
spl_vmo = vmo.extract_sub_vmo(spl_vmo_path, spl_idx_list=spl_idx_list)

# extract by variant site quality control
var_idx_list = spl_vmo.site_filter(maf_thr=0.05, mis_thr=0.5, chunk_size=1000, n_jobs=20)
var_vmo_path = "give_me_your_variant_site_filtered_vmo_path"
var_vmo = spl_vmo.extract_sub_vmo(var_vmo_path, var_idx_list=var_idx_list)
  1. Convert the vmo file into bimbam format

shell

PyVMO converter -vmo2bimbam filter.vmo filter.bimbam

python

bimbam_file = "give_me_your_bimbam_file_path"
var_vmo.to_bimbam(bimbam_file)
  1. get genetic distance matrix

shell

PyVMO distance filter.vmo ibs.matrix

From Other practices

  1. Get numpy array from vmo
m = var_vmo.get_matrix()
  1. Get the sample list
sample_list = var_vmo.get_sample_info()
  1. Get the variant information in a pandas dataframe
from toolbiox.lib.common.sqlite_command import pickle_load_obj, pickle_dump_obj

var_info_df = var_vmo.get_variant_info()

var_index = 1

chr_id = var_info_df.iloc[i]['CHROM']
pos = int(var_info_df.iloc[i]['POS'])
ref = var_info_df.iloc[i]['REF']
alt = pickle_load_obj(var_info_df.iloc[i]['ALT']) # alt is a list
qual = pickle_load_obj(var_info_df.iloc[i]['QUAL'])
filter = pickle_load_obj(var_info_df.iloc[i]['FILTER'])
info = pickle_load_obj(var_info_df.iloc[i]['INFO']) # info is a dict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvmo-0.0.4.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

pyVMO-0.0.4-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file pyvmo-0.0.4.tar.gz.

File metadata

  • Download URL: pyvmo-0.0.4.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyvmo-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d1009d0d9cdc4dc01f1aa8ebd238ef8f993f72b933abb63636adf9122e591311
MD5 f2ec9e8d7bc4579b3f6398fc21a52703
BLAKE2b-256 d1be10ce577f11afb38044e211a218b6381d1ba2214bd71d298bdc49b7e0de10

See more details on using hashes here.

File details

Details for the file pyVMO-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pyVMO-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for pyVMO-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 99228a01da616b801d277f4c540cc75abc77fe620da1e726a0e61f6ac07b82ca
MD5 ded8c991d20f40c35ff8c2fbbb2d8972
BLAKE2b-256 e5a6a2efb4c18d5025e5719f2a1d81d194cb3ffadf35539af39a6019b9ae5579

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page