Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequencing Data
Project description
SEQLinkage
Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequencing Data
Pre-requisites
Make sure you install the pre-requisited before running seqlink:
conda install -c conda-forge xeus-cling
conda install -c anaconda swig
conda install -c conda-forge gsl
pip install egglib
git clone https://github.com/statgenetics/cstatgen.git
cd cstatgen
python setup.py install
Install
pip install SEQLinkage
How to use
1. Test on seqlinkage-example
seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMBPt8 --jobs 8
seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMB0 --jobs 8 --bin 0
seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz -f MERLIN --output RMB1 --jobs 8 --bin 1
seqlink --fam seqlinkage-example.fam --vcf seqlinkage-example.vcf.gz --freq EVSEAAF -o LinkageAnalysis -K 0.001 --moi AR -W 0 -M 1 --theta-max 0.5 --theta-inc 0.05 -j 8 --run-linkage
2. Test on AD family
seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz -f LINKAGE --blueprint data/genemap.hg38.txt --freq AF -K 0.001 --moi AD -W 0 -M 1
seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF
./seqlink --fam seqlinkage-example/seqlinkage-example.fam --vcf seqlinkage-example/seqlinkage-example.vcf.gz -f MERLIN --blueprint data/genemap.txt --freq EVSEAAF -o seqtest
./seqlink --fam data/new_trim_ped_famless17.fam --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF
./seqlink --fam data/new_trim_ped_famless17.fam --vcf data/first1000snp_full_samples.vcf.gz -f MERLIN --blueprint data/genemap.hg38.txt --freq AF -K 0.001 --moi AD -W 0 -M 1 --run-linkage
./seqlink --fam data/Example_data/pedigree.fam --vcf data/Example_data/example.vcf.gz -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --chrom-prefix chr --freq AF -o data/Example_data/output -K 0.001 --moi AD -W 0 -M 1
./seqlink --fam data/mwe_normal_fam.csv --vcf data/first1000snp_full_samples.vcf.gz --anno data/first1000_chr1_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/first1000test -K 0.001 --moi AD -W 0 -M 1
./seqlink --fam data/new_trim_ped_famless17_no:xx.fam --vcf /mnt/mfs/statgen/alzheimers-family/linkage_files/geno/full_sample/vcf/full_sample.vcf.gz --anno MWE/annotation/EFIGA_NIALOAD_chr1.hg38.hg38_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/fullchr1data -K 0.001 --moi AD -W 0 -M 1 -j 4
Testing output
seqlink --fam seqlinkage-example/seqlinkage-example.fam --vcf seqlinkage-example/seqlinkage-example.vcf.gz -f MERLIN MEGA2 PLINK LINKAGE --blueprint data/genemap.txt --freq EVSEAAF -o data/seqtest_20220221 -K 0.001 --moi AD -W 0 -M 1 -j 4
seqlink --fam data/new_trim_ped_famless17_no:xx.fam --vcf /mnt/mfs/statgen/alzheimers-family/linkage_files/geno/full_sample/vcf/full_sample.vcf.gz --anno MWE/annotation/EFIGA_NIALOAD_chr22.hg38.hg38_multianno.csv --pop data/full_sample_fam_pop.txt -f MERLIN MEGA2 PLINK LINKAGE --build hg38 --freq AF -o data/fullchr22data -K 0.001 --moi AD -W 0 -M 1 -j 8
import pandas as pd
tmp = pd.read_csv('../data/genemap.hg38.txt',sep='\t',header=None)
tmp
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
0 | 1 | 11868 | 14362 | LOC102725121@1 | 9.177127e-07 | 0.000001 | 6.814189e-07 |
1 | 1 | 11873 | 14409 | DDX11L1 | 9.195321e-07 | 0.000001 | 6.827698e-07 |
2 | 1 | 14361 | 29370 | WASH7P | 1.529988e-06 | 0.000002 | 1.136045e-06 |
3 | 1 | 17368 | 17436 | MIR6859-1@1,MIR6859-2@1,MIR6859-3@1,MIR6859-4@1 | 1.217693e-06 | 0.000002 | 9.041595e-07 |
4 | 1 | 30365 | 30503 | MIR1302-10@1,MIR1302-11@1,MIR1302-2@1,MIR1302-9@1 | 2.129597e-06 | 0.000003 | 1.581266e-06 |
... | ... | ... | ... | ... | ... | ... | ... |
28320 | X | 155612564 | 155782457 | SPRY3 | NaN | 196.056662 | NaN |
28321 | X | 155881344 | 155943769 | VAMP7 | NaN | 196.190010 | 5.600000e+01 |
28322 | X | 155997695 | 156010817 | IL9R | NaN | 196.305985 | NaN |
28323 | X | 156014563 | 156016830 | WASIR1 | NaN | 196.320452 | NaN |
28324 | X | 156025657 | 156028183 | DDX11L16 | NaN | 196.334645 | NaN |
28325 rows × 7 columns
tmp[0].value_counts()
1 2809
2 1816
19 1779
11 1678
17 1583
3 1560
6 1453
12 1386
7 1359
5 1314
X 1209
16 1177
9 1119
10 1107
4 1091
8 1056
15 1012
14 946
20 774
22 646
13 629
18 436
21 374
17_KI270909v1_alt 3
22_KI270879v1_alt 3
7_KI270803v1_alt 2
15_KI270850v1_alt 2
4_GL000008v2_random 1
1_KI270706v1_random 1
Name: 0, dtype: int64
tmp1 = tmp[tmp[0]=='22']
tmp1[tmp1[3]=='LINC01664']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
26484 | 22 | 17121594 | 17132104 | LINC01664 | 3.496778 | 5.134655 | 1.946244 |
tmp1[tmp1[3]=='BID']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
26493 | 22 | 17734139 | 17774665 | BID | 7.132236 | 10.196359 | 4.192282 |
tmp1[tmp1[3]=='RTL10']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
26535 | 22 | 19846145 | 19854874 | RTL10 | 12.136091 | 16.519006 | 8.210012 |
tmp1
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | 1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|---|
26467 | 22 | 15784953 | 15827434 | DUXAP8 | 0.000000 | 0.000000 | 0.000000 |
26468 | 22 | 15805697 | 15820884 | BMS1P22@3 | 0.000000 | 0.000000 | 0.000000 |
26469 | 22 | 15805697 | 15815897 | BMS1P17@3,BMS1P18@3 | 0.000000 | 0.000000 | 0.000000 |
26470 | 22 | 15740892 | 15778287 | PSLNR | 0.000000 | 0.000000 | 0.000000 |
26471 | 22 | 15690025 | 15721631 | POTEH | 0.000000 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... |
27111 | 22 | 50674390 | 50733212 | SHANK3 | 79.970067 | 90.409669 | 70.999029 |
27112 | 22 | 50735828 | 50738169 | LOC105373100 | 80.037493 | 90.485907 | 71.073657 |
27113 | 22 | 50738203 | 50745339 | ACR | 80.044958 | 90.494347 | 71.080286 |
27114 | 22 | 50757085 | 50799637 | RPL23AP82 | 80.102184 | 90.559043 | 71.131103 |
27115 | 22 | 50767506 | 50783636 | RABL2B | 80.097821 | 90.554110 | 71.127228 |
646 rows × 7 columns
535-467
68
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for SEQLinkage-1.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b58d77b0a76d2ad8c7f1dabc2051fb5ffa48b45c9ebc69fef07e9d2b2db3e0d |
|
MD5 | a82e29ce6a8cc3768b207eca7bcfe52d |
|
BLAKE2b-256 | cd055b64866660306553dc7b83deb68ba2d469a3d0b9585970718b763c8cc236 |