Skip to main content

a very fast Y-haplogroup caller

Project description

# Ypredict
**Ypredict** is a python based software package that predicts y chromosome haplogroup. Here, I use calculate rank method to automatically find the most likely y haplogroup. For each y haplogroup, I give two mark (T or F) according their snp calling state. For Example, if the haplogroup O2a1a1a2a1 in isogg (<https://isogg.org/tree/>) haplogroup tree has six snps, five snps was observed. If the ratio is 5/6 >= 0.2, I give the T mark. Else, F mark. For each haplogroup, I calculate the number of T mark as n_T, nonexist as n_F, along the routine from the 'Y' haplogroup to this haplogroup(rank = (n_T**2)/(n_T + n_F)). If the rank same, the max number of n_T of the haplogroup will be the most likely haplogroup. If rank and n_T are both the same, then ramdomly select the one of matched haplogroup.
* The current version is 0.0.1

# Dependence
* biopython(<https://biopython.org/wiki/Download>)
* GATK(<https://software.broadinstitute.org/gatk/download/>)

# Getting Started
***
## Step1
Download y haplogroup tree from isogg. Then, filter snp by snpfilter.py. In this step, hotspot and backmutate snp will be removed. Finally, two files map.json and ref_vcf.gz will be generated. Importantly, Y chromosome fasta file will use in this step(test/Y.fasta), it can be downloaded from NCBI or get from hg38 reference genome using bedtools.
`snpfilter.py -snp snp14.3.csv -y Y.fasta`
## Step2
In this step, we will use the file ref_vcf.gz generated by the step1 to make snp calling using gatk3.8 UnifiedGenotyper module. Critically, we use hg38 reference genome in this step.

`java -Xmx32g -jar GenomeAnalysisTK.jar -T UnifiedGenotyper
-R hg38.fa -I *.bam -o y.vcf.gz
--intervals chrY
-ploidy 1
--output_mode EMIT_ALL_SITES --genotyping_mode GENOTYPE_GIVEN_ALLELES
--alleles ref_vcf.gz `
## Step3
Y chromosome haplogroup can be predicted by ypredict.py. In the this step, the script will automatically output the most likely haplogroup. The final result can be seen in ypredict.txt. More detail output writed in ystatistics.csv.

`ypredict.py -vcf y.vcf.gz -s hfspecial.xlsx -m map.json`

If you need to update y haplogroup tree file downloaded from isogg, you can redo step1 and get an updated map.json.
# Install
`git clone https://github.com/N-damo/ypredict-master.git`

`python setup.py install`
or `pip install ypredict`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Ypredict-0.1.3.tar.gz (9.2 MB view details)

Uploaded Source

File details

Details for the file Ypredict-0.1.3.tar.gz.

File metadata

  • Download URL: Ypredict-0.1.3.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for Ypredict-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f8aecfca064dfb08c28bb047f376b2a4d3efa5fb1be938d624b0d3afe916952d
MD5 9ec4465bccddb2e8d916dd313f9fb1bd
BLAKE2b-256 05505f5bfac4e2da73442b79a14cc21474bc16776a44323b606fa9abd64d4fa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page