A test python toolkit for variant site analysis
Project description
pyVMO
pyVMO (python Variant Memmap Object), a python toolkit to help you work with very large variational matrices.
Installation
pip install pyvmo
Usage
From GWAS
- Compress and index your raw vcf files
bgzip your_raw.vcf
tabix -p vcf your_raw.vcf.gz
- Convert the vcf file to vmo, which is a numpy based memmap file (memory map) that will help you read oversized matrices to get the job done in limited memory
shell
PyVMO converter -vcf2vmo your_raw.vcf.gz test.vmo
python
from pyvmo import VMO
vmo_path = "give_me_your_vmo_path"
raw_vcf_file = "your_raw.vcf.gz"
vmo = VMO(vmo_path)
vmo.store_vcf_to_vmo(raw_vcf_file)
- Extraction of useful submatrices by sample listing, variant site quality control
shell
PyVMO extractor sample.id.list test.vmo filter.vmo
python
# extract by sample list
spl_idx_list = vmo.get_samples_index(sample_id_list)
spl_vmo_path = "give_me_your_sample_list_filtered_vmo_path"
spl_vmo = vmo.extract_sub_vmo(spl_vmo_path, spl_idx_list=spl_idx_list)
# extract by variant site quality control
var_idx_list = spl_vmo.site_filter(maf_thr=0.05, mis_thr=0.5, chunk_size=1000, n_jobs=20)
var_vmo_path = "give_me_your_variant_site_filtered_vmo_path"
var_vmo = spl_vmo.extract_sub_vmo(var_vmo_path, var_idx_list=var_idx_list)
- Convert the vmo file into bimbam format
shell
PyVMO converter -vmo2bimbam filter.vmo filter.bimbam
python
bimbam_file = "give_me_your_bimbam_file_path"
var_vmo.to_bimbam(bimbam_file)
- get genetic distance matrix
shell
PyVMO distance filter.vmo ibs.matrix
From Other practices
- Get numpy array from vmo
m = var_vmo.get_matrix()
- Get the sample list
sample_list = var_vmo.get_sample_info()
- Get the variant information in a pandas dataframe
from toolbiox.lib.common.sqlite_command import pickle_load_obj, pickle_dump_obj
var_info_df = var_vmo.get_variant_info()
var_index = 1
chr_id = var_info_df.iloc[i]['CHROM']
pos = int(var_info_df.iloc[i]['POS'])
ref = var_info_df.iloc[i]['REF']
alt = pickle_load_obj(var_info_df.iloc[i]['ALT']) # alt is a list
qual = pickle_load_obj(var_info_df.iloc[i]['QUAL'])
filter = pickle_load_obj(var_info_df.iloc[i]['FILTER'])
info = pickle_load_obj(var_info_df.iloc[i]['INFO']) # info is a dict
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyvmo-0.0.5.tar.gz
(11.1 kB
view details)
Built Distribution
pyVMO-0.0.5-py3-none-any.whl
(13.9 kB
view details)
File details
Details for the file pyvmo-0.0.5.tar.gz
.
File metadata
- Download URL: pyvmo-0.0.5.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67f215708205886c925bb1d2e339f5f19cc02704f1355ae730e199460b7e380b |
|
MD5 | 044ba87538340b5fe2febe0f6e1cd6c6 |
|
BLAKE2b-256 | 8c9a2c7636068c430512c983a218c1c6e8be5f498c8c5c1965f8c55ed7b67db8 |
File details
Details for the file pyVMO-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: pyVMO-0.0.5-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a739950b5cb6428018142c167d9b4dbbf3ee7f2ba00ecc21b953079c13bc58c8 |
|
MD5 | 0d31cf0a5603f08988380e1c7f4d8914 |
|
BLAKE2b-256 | 3d11f2a9c527d8d9d61a6bbd78911243ac9142ccf05264e0b7207713d90a628f |