Skip to main content

a very fast Y-haplogroup caller

Project description

# Ypredict
**Ypredict** is a python based software package that predicts y chromosome haplogroup. Here, I use calculate rank method to automatically find the most likely y haplogroup. For each y haplogroup, I give two mark (T or F) according their snp calling state. For Example, if the haplogroup O2a1a1a2a1 in isogg (<https://isogg.org/tree/>) haplogroup tree has six snps, five snps was observed. If the ratio is 5/6 >= 0.2, I give the T mark. Else, F mark. For each haplogroup, I calculate the number of T mark as n_T, nonexist as n_F, along the routine from the 'Y' haplogroup to this haplogroup(rank = (n_T**2)/(n_T + n_F)). If the rank same, the max number of n_T of the haplogroup will be the most likely haplogroup. If rank and n_T are both the same, then ramdomly select the one of matched haplogroup.
* The current version is 0.0.1

# Dependence
* biopython(<https://biopython.org/wiki/Download>)
* GATK(<https://software.broadinstitute.org/gatk/download/>)

# Getting Started
***
## Step1
Download y haplogroup tree from isogg. Then, filter snp by snpfilter.py. In this step, hotspot and backmutate snp will be removed. Finally, two files map.json and ref_vcf.gz will be generated. Importantly, Y chromosome fasta file will use in this step(test/Y.fasta), it can be downloaded from NCBI or get from hg38 reference genome using bedtools.
`snpfilter.py -snp snp14.3.csv -y Y.fasta`
## Step2
In this step, we will use the file ref_vcf.gz generated by the step1 to make snp calling using gatk3.8 UnifiedGenotyper module. Critically, we use hg38 reference genome in this step.

`java -Xmx32g -jar GenomeAnalysisTK.jar -T UnifiedGenotyper
-R hg38.fa -I *.bam -o y.vcf.gz
--intervals chrY
-ploidy 1
--output_mode EMIT_ALL_SITES --genotyping_mode GENOTYPE_GIVEN_ALLELES
--alleles ref_vcf.gz `
## Step3
Y chromosome haplogroup can be predicted by ypredict.py. In the this step, the script will automatically output the most likely haplogroup. The final result can be seen in ypredict.txt. More detail output writed in ystatistics.csv.

`ypredict.py -vcf y.vcf.gz -s hfspecial.xlsx -m map.json`

If you need to update y haplogroup tree file downloaded from isogg, you can redo step1 and get an updated map.json.
# Install
`git clone https://github.com/N-damo/ypredict-master.git`

`python setup.py install`
or `pip install ypredict`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Ypredict-0.1.3.tar.gz (9.2 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page