Xu Yuxing's personal quantitative genomic tools
Project description
yxquantgene
Xu Yuxing's personal quantitative genomic tools
Installation
pip install yxquantgene
Usage
1. Read VCF file and get variant statistics
from yxquantgene import build_var_stat_table
vcf_file = 'path/to/vcf_file' # vcf file should be bgzip compressed and indexed by tabix
ref_genome_file = 'path/to/ref_genome_file'
stat_h5_file = 'path/to/output_stat_h5_file'
build_var_stat_table(vcf_file, ref_genome_file, stat_h5_file)
2. Filter variants by statistics
import pandas as pd
from yxquantgene import extract_subvcf_by_varIDs
chr_id = 'Chr01'
max_missing_rate = 0.5
min_maf = 0.01
max_het_rate = 0.1
var_df = pd.read_hdf(stat_h5_file, chr_id)
var_df = var_df[(var_df['MISSF'] <= max_missing_rate)]
var_df = var_df[(var_df['MAF'] >= min_maf)]
var_df = var_df[(var_df['HETF'] <= max_het_rate)]
filtered_varID_list_file = 'path/to/filtered_varID_list_file'
var_df['ID'].to_csv(filtered_varID_list_file, index=False)
filtered_vcf_file = 'path/to/filtered_vcf_file'
extract_subvcf_by_varIDs(input_vcf_file, varID_list, filtered_vcf_file)
3. Prune variants by LD
You can prune variants by LD and filter low quality variants at the same time.
from yxquantgene import build_LD_db, build_var_stat_table, psa_snp_pruner
input_vcf_file = 'path/to/vcf_file'
ref_genome_file = 'path/to/ref_genome_file'
stat_h5_file = 'path/to/output_stat_h5_file'
snp_ld_dir = 'path/to/snp_ld_dir'
ld_db_win_size = 150000
ld_decay_size = 150000
ld_r2_threshold = 0.5
max_missing_rate = 0.5
min_maf = 0.01
max_het_rate = 0.1
threads = 20
output_prefix = 'path/to/output_prefix'
build_var_stat_table(input_vcf_file, ref_genome_file, stat_h5_file)
build_LD_db(input_vcf_file, stat_h5_file, snp_ld_dir, window_size=ld_db_win_size)
psa_snp_pruner(input_vcf_file, stat_h5_file, ld_db_path, output_prefix, ld_db_win_size=ld_db_win_size, ld_decay_size=ld_decay_size, ld_r2_threshold=ld_r2_threshold, max_missing_rate=max_missing_rate, min_maf=min_maf, max_het_rate=max_het_rate, threads=threads)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
yxquantgene-0.0.4.tar.gz
(22.5 kB
view details)
Built Distribution
File details
Details for the file yxquantgene-0.0.4.tar.gz
.
File metadata
- Download URL: yxquantgene-0.0.4.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 900c94531ddd67c175d7803c58a05f03bfef52ce74209c44f85c367cb062f381 |
|
MD5 | e2773c0a73dd338cc63efb7f76edc959 |
|
BLAKE2b-256 | df528819a9ecfe57d6b9aacc6614f6647cff5e64607a23586dbe7275d04abc0b |
File details
Details for the file yxquantgene-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: yxquantgene-0.0.4-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 746e37ac05479d5a576d8db0618410e0f659a2de3abb9fb9b2cca0e350fcb584 |
|
MD5 | 52974dbc67ce482694896ac4e7b51a39 |
|
BLAKE2b-256 | be4beaa6cda4a20d7dcec9472bd3ee21fbaaeb75beb9118e5acd6f798a076f91 |