Skip to main content

Xu Yuxing's personal quantitative genomic tools

Project description

yxquantgene

Xu Yuxing's personal quantitative genomic tools

Installation

pip install yxquantgene

Usage

1. Read VCF file and get variant statistics

from yxquantgene import build_var_stat_table

vcf_file = 'path/to/vcf_file' # vcf file should be bgzip compressed and indexed by tabix
ref_genome_file = 'path/to/ref_genome_file' 
stat_h5_file = 'path/to/output_stat_h5_file'

build_var_stat_table(vcf_file, ref_genome_file, stat_h5_file)

2. Filter variants by statistics

import pandas as pd
from yxquantgene import extract_subvcf_by_varIDs

chr_id = 'Chr01'
max_missing_rate = 0.5
min_maf = 0.01
max_het_rate = 0.1

var_df = pd.read_hdf(stat_h5_file, chr_id)

var_df = var_df[(var_df['MISSF'] <= max_missing_rate)]
var_df = var_df[(var_df['MAF'] >= min_maf)]
var_df = var_df[(var_df['HETF'] <= max_het_rate)]

filtered_varID_list_file = 'path/to/filtered_varID_list_file'
var_df['ID'].to_csv(filtered_varID_list_file, index=False)

filtered_vcf_file = 'path/to/filtered_vcf_file'
extract_subvcf_by_varIDs(input_vcf_file, varID_list, filtered_vcf_file)

3. Prune variants by LD

You can prune variants by LD and filter low quality variants at the same time.

from yxquantgene import build_LD_db, build_var_stat_table, psa_snp_pruner

input_vcf_file = 'path/to/vcf_file'
ref_genome_file = 'path/to/ref_genome_file'
stat_h5_file = 'path/to/output_stat_h5_file'
snp_ld_dir = 'path/to/snp_ld_dir'
ld_db_win_size = 150000
ld_decay_size = 150000
ld_r2_threshold = 0.5
max_missing_rate = 0.5
min_maf = 0.01
max_het_rate = 0.1
threads = 20
output_prefix = 'path/to/output_prefix'

build_var_stat_table(input_vcf_file, ref_genome_file, stat_h5_file)
build_LD_db(input_vcf_file, stat_h5_file, snp_ld_dir, window_size=ld_db_win_size)
psa_snp_pruner(input_vcf_file, stat_h5_file, ld_db_path, output_prefix, ld_db_win_size=ld_db_win_size, ld_decay_size=ld_decay_size, ld_r2_threshold=ld_r2_threshold, max_missing_rate=max_missing_rate, min_maf=min_maf, max_het_rate=max_het_rate, threads=threads)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yxquantgene-0.0.4.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

yxquantgene-0.0.4-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file yxquantgene-0.0.4.tar.gz.

File metadata

  • Download URL: yxquantgene-0.0.4.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for yxquantgene-0.0.4.tar.gz
Algorithm Hash digest
SHA256 900c94531ddd67c175d7803c58a05f03bfef52ce74209c44f85c367cb062f381
MD5 e2773c0a73dd338cc63efb7f76edc959
BLAKE2b-256 df528819a9ecfe57d6b9aacc6614f6647cff5e64607a23586dbe7275d04abc0b

See more details on using hashes here.

File details

Details for the file yxquantgene-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: yxquantgene-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for yxquantgene-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 746e37ac05479d5a576d8db0618410e0f659a2de3abb9fb9b2cca0e350fcb584
MD5 52974dbc67ce482694896ac4e7b51a39
BLAKE2b-256 be4beaa6cda4a20d7dcec9472bd3ee21fbaaeb75beb9118e5acd6f798a076f91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page