Re-implementation of lostruct in Python
Project description
lostruct-py
This is a reimplementation of lostruct from the original code: Lostruct. Please cite the original paper
Demonstration / How to use
Please see the Example Notebook
Citing
Original Lostruct Paper
Please cite the original lostruct paper:
Li, Han, and Peter Ralph. "Local PCA shows how the effect of population structure differs along the genome." Genetics 211.1 (2019): 289-304.
CyVCF2
This paper also uses cyvcf2 for fast VCF processing and should be cited:
Brent S Pedersen, Aaron R Quinlan, cyvcf2: fast, flexible variant analysis with Python, Bioinformatics, Volume 33, Issue 12, 15 June 2017, Pages 1867–1869, https://doi.org/10.1093/bioinformatics/btx057
Requirements
Python >= 3.6 (may work with older versions)
- numba
- numpy
- pandas
- scipy
- skbio
- sklearn
- cyvcf2
CyVCF2 requires zlib-dev, libbz2-dev, libcurl-dev, liblzma-dev, and probably others
Easiest to install all of these through conda
Correlation Data
Used Medicago HapMap sister taxa chromsoome 1, processed, and run with LoStruct
Data
bcftools annotate chr1-filtered-set-2014Apr15.bcf -x INFO,FORMAT | bcftools view -a -i 'F_MISSING<=0.2' | bcftools view -q 0.05 -q 0.95 -m2 -M2 -a -Oz -o chr1-filtered.vcf.gz
Lostruct Processing
Rscript run_lostruct.R -t SNP -s 95 -k 10 -m 10 -i data/
This generates the mds_coords.tsv that is used in the correlation comparison.
FAQ / Notes
PCA, MDS, PCoA
PCoA returns the same results as lostruct's MDS implementation (cmdscale). In the example Jupyter notebook you can see the correlation is R =~ 0.998. Some examples of other methods of clustering / looking at differences are included in the notebook.
Casting complex values to real discards the imaginary part
This is fine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for lostruct_py-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d43f5ddcb7b19b97e7f4d211431be62a50d116cecac6f8535102a6bbff69adf |
|
MD5 | 9ce4494bc9dc183b855d2615d6dc87dc |
|
BLAKE2b-256 | cbe921448e74b538a976c780f4faeaaa3d1e6761309d769a43e5f62332f018f2 |