Skip to main content

Re-implementation of lostruct in Python

Project description

lostruct-py

This is a reimplementation of lostruct from the original code: Lostruct. Please cite the original paper

Demonstration / How to use

Please see the Example Notebook

Citing

Original Lostruct Paper

Please cite the original lostruct paper:

Li, Han, and Peter Ralph. "Local PCA shows how the effect of population structure differs along the genome." Genetics 211.1 (2019): 289-304.

CyVCF2

This paper also uses cyvcf2 for fast VCF processing and should be cited:

Brent S Pedersen, Aaron R Quinlan, cyvcf2: fast, flexible variant analysis with Python, Bioinformatics, Volume 33, Issue 12, 15 June 2017, Pages 1867–1869, https://doi.org/10.1093/bioinformatics/btx057

Requirements

Python >= 3.6 (may work with older versions)

  • numba
  • numpy
  • pandas
  • scipy
  • skbio
  • sklearn
  • cyvcf2

CyVCF2 requires zlib-dev, libbz2-dev, libcurl-dev, liblzma-dev, and probably others

Easiest to install all of these through conda

Correlation Data

Used Medicago HapMap sister taxa chromsoome 1, processed, and run with LoStruct

Data

bcftools annotate chr1-filtered-set-2014Apr15.bcf -x INFO,FORMAT | bcftools view -a -i 'F_MISSING<=0.2' | bcftools view -q 0.05 -q 0.95 -m2 -M2 -a -Oz -o chr1-filtered.vcf.gz

Lostruct Processing

Rscript run_lostruct.R -t SNP -s 95 -k 10 -m 10 -i data/

This generates the mds_coords.tsv that is used in the correlation comparison.

FAQ / Notes

PCA, MDS, PCoA

PCoA returns the same results as lostruct's MDS implementation (cmdscale). In the example Jupyter notebook you can see the correlation is R =~ 0.998. Some examples of other methods of clustering / looking at differences are included in the notebook.

Casting complex values to real discards the imaginary part

This is fine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

lostruct_py-0.0.1-py3-none-any.whl (7.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page