Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

Re-implementation of lostruct in Python, used to compare local population structure across populations.

Project description

lostruct-py

This is a reimplementation of lostruct from the original code: Lostruct. Please cite the original paper

Build Status DOI

Demonstration / How to use

Please see the Example Notebook

Installation

Lostruct-py is available on PyPi pip install lostruct-py is the easiest way to get started.

Citing

Please use our DOI to cite this specific project. Also please cite the original Lostruct paper and CyVCF2.

DOI: 10.5281/zenodo.3997086

Original Lostruct Paper

Please cite the original lostruct paper:

Li, Han, and Peter Ralph. "Local PCA shows how the effect of population structure differs along the genome." Genetics 211.1 (2019): 289-304.

CyVCF2

This paper also uses cyvcf2 for fast VCF processing and should be cited:

Brent S Pedersen, Aaron R Quinlan, cyvcf2: fast, flexible variant analysis with Python, Bioinformatics, Volume 33, Issue 12, 15 June 2017, Pages 1867–1869, https://doi.org/10.1093/bioinformatics/btx057

Changes from Lostruct R package

Please note numpy and R are different when it comes to row-major vs. column-major. Essentially, many things in the python version will be transposed from R.

Requirements

Python >= 3.6 (may work with older versions). Developed on Python 3.8.5

  • numba
  • numpy
  • pandas
  • scipy
  • skbio
  • sklearn
  • cyvcf2

CyVCF2 requires zlib-dev, libbz2-dev, libcurl-dev, liblzma-dev, and probably others

Easiest to install all of these through conda

Correlation Data

Used Medicago HapMap sister taxa chromsome 1, processed, and run with LoStruct

Data

bcftools annotate chr1-filtered-set-2014Apr15.bcf -x INFO,FORMAT | bcftools view -a -i 'F_MISSING<=0.2' | bcftools view -q 0.05 -q 0.95 -m2 -M2 -a -Oz -o chr1-filtered.vcf.gz

Lostruct Processing

Rscript run_lostruct.R -t SNP -s 95 -k 10 -m 10 -i data/

This generates the mds_coords.tsv that is used in the correlation comparison.

FAQ / Notes

Future

Currently the end-user is expected to save the outputs. But could be good to save it in a similar way to lostruct R-code. Please open an issue if you need this.

PCA, MDS, PCoA

PCoA returns the same results as lostruct's MDS implementation (cmdscale). In the example Jupyter notebook you can see the correlation is R =~ 0.998. Some examples of other methods of clustering / looking at differences are included in the notebook.

Casting complex values to real discards the imaginary part

This is fine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lostruct-py, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size lostruct_py-0.0.3-py3-none-any.whl (9.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size lostruct-py-0.0.3.tar.gz (6.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page