Skip to main content

Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequencing Data

Project description

SEQLinkage

Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequencing Data

Pre-requisites

Make sure you install the pre-requisited before running seqlink:

conda install -c conda-forge xeus-cling
conda install -c anaconda swig 
conda install -c conda-forge gsl
pip install egglib
git clone https://github.com/statgenetics/cstatgen.git
cd cstatgen
python setup.py install

Install

pip install SEQLinkage

How to use

1. Test on seqlinkage-example

seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMBPt8 --jobs 8

seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMB0 --jobs 8 --bin 0

seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMB1 --jobs 8 --bin 1

seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz --freq EVSEAAF -o LinkageAnalysis -K 0.001 --moi AR -W 0 -M 1 --theta-max 0.5 --theta-inc 0.05 -j 8 --run-linkage

2. Test on AD family

seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz -f LINKAGE --blueprint data/genemap.hg38.txt --freq AF -K 0.001 --moi AD -W 0 -M 1

seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF
./seqlink --fam seqlinkage-example/seqlinkage-example.fam --vcf seqlinkage-example/seqlinkage-example.vcf.gz -f MERLIN --blueprint data/genemap.txt --freq EVSEAAF -o seqtest
./seqlink --fam data/new_trim_ped_famless17.fam --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF

./seqlink --fam data/new_trim_ped_famless17.fam --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF -K 0.001 --moi AD -W 0 -M 1 --run-linkage

./seqlink --fam data/Example_data/pedigree.fam --vcf data/Example_data/example.vcf.gz -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --chrom-prefix chr --freq AF -o data/Example_data/output -K 0.001 --moi AD -W 0 -M 1

./seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz --anno data/first1000_chr1_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/first1000test -K 0.001 --moi AD -W 0 -M 1

./seqlink --fam data/new_trim_ped_famless17_no:xx.fam --vcf /mnt/mfs/statgen/alzheimers-family/linkage_files/geno/full_sample/vcf/full_sample.vcf.gz --anno MWE/annotation/EFIGA_NIALOAD_chr1.hg38.hg38_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/fullchr1data -K 0.001 --moi AD -W 0 -M 1 -j 4

Testing output

seqlink --fam seqlinkage-example/seqlinkage-example.fam --vcf seqlinkage-example/seqlinkage-example.vcf.gz -f MERLIN MEGA2 PLINK LINKAGE --blueprint data/genemap.txt --freq EVSEAAF -o data/seqtest_20220221 -K 0.001 --moi AD -W 0 -M 1 -j 4

seqlink --fam data/new_trim_ped_famless17_no:xx.fam --vcf /mnt/mfs/statgen/alzheimers-family/linkage_files/geno/full_sample/vcf/full_sample.vcf.gz --anno MWE/annotation/EFIGA_NIALOAD_chr22.hg38.hg38_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/fullchr22data -K 0.001 --moi AD -W 0 -M 1 -j 8

import pandas as pd
tmp = pd.read_csv('../data/genemap.hg38.txt',sep='\t',header=None)
tmp
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6
0 1 11868 14362 LOC102725121@1 9.177127e-07 0.000001 6.814189e-07
1 1 11873 14409 DDX11L1 9.195321e-07 0.000001 6.827698e-07
2 1 14361 29370 WASH7P 1.529988e-06 0.000002 1.136045e-06
3 1 17368 17436 MIR6859-1@1,MIR6859-2@1,MIR6859-3@1,MIR6859-4@1 1.217693e-06 0.000002 9.041595e-07
4 1 30365 30503 MIR1302-10@1,MIR1302-11@1,MIR1302-2@1,MIR1302-9@1 2.129597e-06 0.000003 1.581266e-06
... ... ... ... ... ... ... ...
28320 X 155612564 155782457 SPRY3 NaN 196.056662 NaN
28321 X 155881344 155943769 VAMP7 NaN 196.190010 5.600000e+01
28322 X 155997695 156010817 IL9R NaN 196.305985 NaN
28323 X 156014563 156016830 WASIR1 NaN 196.320452 NaN
28324 X 156025657 156028183 DDX11L16 NaN 196.334645 NaN

28325 rows × 7 columns

tmp[0].value_counts()
1                      2809
2                      1816
19                     1779
11                     1678
17                     1583
3                      1560
6                      1453
12                     1386
7                      1359
5                      1314
X                      1209
16                     1177
9                      1119
10                     1107
4                      1091
8                      1056
15                     1012
14                      946
20                      774
22                      646
13                      629
18                      436
21                      374
17_KI270909v1_alt         3
22_KI270879v1_alt         3
7_KI270803v1_alt          2
15_KI270850v1_alt         2
4_GL000008v2_random       1
1_KI270706v1_random       1
Name: 0, dtype: int64
tmp1 = tmp[tmp[0]=='22']
tmp1[tmp1[3]=='LINC01664']
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6
26484 22 17121594 17132104 LINC01664 3.496778 5.134655 1.946244
tmp1[tmp1[3]=='BID']
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6
26493 22 17734139 17774665 BID 7.132236 10.196359 4.192282
tmp1[tmp1[3]=='RTL10']
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6
26535 22 19846145 19854874 RTL10 12.136091 16.519006 8.210012
tmp1
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6
26467 22 15784953 15827434 DUXAP8 0.000000 0.000000 0.000000
26468 22 15805697 15820884 BMS1P22@3 0.000000 0.000000 0.000000
26469 22 15805697 15815897 BMS1P17@3,BMS1P18@3 0.000000 0.000000 0.000000
26470 22 15740892 15778287 PSLNR 0.000000 0.000000 0.000000
26471 22 15690025 15721631 POTEH 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ...
27111 22 50674390 50733212 SHANK3 79.970067 90.409669 70.999029
27112 22 50735828 50738169 LOC105373100 80.037493 90.485907 71.073657
27113 22 50738203 50745339 ACR 80.044958 90.494347 71.080286
27114 22 50757085 50799637 RPL23AP82 80.102184 90.559043 71.131103
27115 22 50767506 50783636 RABL2B 80.097821 90.554110 71.127228

646 rows × 7 columns

535-467
68

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SEQLinkage-1.0.6.tar.gz (4.8 MB view hashes)

Uploaded Source

Built Distribution

SEQLinkage-1.0.6-py3-none-any.whl (36.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page