Skip to main content

Re-implementation of lostruct in Python, used to compare local population structure across populations.

Project description

lostruct-py

This is a reimplementation of lostruct from the original code: Lostruct. Please cite the original paper

Build Status DOI

Demonstration / How to use

Please see the Example Notebook

Installation

Lostruct-py is available on PyPi pip install lostruct-py is the easiest way to get started.

Citing

Please use our DOI to cite this specific project. Also please cite the original Lostruct paper and CyVCF2.

DOI: 10.5281/zenodo.3997086

Original Lostruct Paper

Please cite the original lostruct paper:

Li, Han, and Peter Ralph. "Local PCA shows how the effect of population structure differs along the genome." Genetics 211.1 (2019): 289-304.

CyVCF2

This paper also uses cyvcf2 for fast VCF processing and should be cited:

Brent S Pedersen, Aaron R Quinlan, cyvcf2: fast, flexible variant analysis with Python, Bioinformatics, Volume 33, Issue 12, 15 June 2017, Pages 1867–1869, https://doi.org/10.1093/bioinformatics/btx057

Changes from Lostruct R package

Please note numpy and R are different when it comes to row-major vs. column-major. Essentially, many things in the python version will be transposed from R.

Requirements

Python >= 3.6 (may work with older versions). Developed on Python 3.8.5

  • numba
  • numpy
  • pandas
  • scipy
  • skbio
  • sklearn
  • cyvcf2

CyVCF2 requires zlib-dev, libbz2-dev, libcurl-dev, liblzma-dev, and probably others

Easiest to install all of these through conda

Correlation Data

Used Medicago HapMap sister taxa chromsome 1, processed, and run with LoStruct

Data

bcftools annotate chr1-filtered-set-2014Apr15.bcf -x INFO,FORMAT | bcftools view -a -i 'F_MISSING<=0.2' | bcftools view -q 0.05 -q 0.95 -m2 -M2 -a -Oz -o chr1-filtered.vcf.gz

Lostruct Processing

Rscript run_lostruct.R -t SNP -s 95 -k 10 -m 10 -i data/

This generates the mds_coords.tsv that is used in the correlation comparison.

FAQ / Notes

Future

Currently the end-user is expected to save the outputs. But could be good to save it in a similar way to lostruct R-code. Please open an issue if you need this.

PCA, MDS, PCoA

PCoA returns the same results as lostruct's MDS implementation (cmdscale). In the example Jupyter notebook you can see the correlation is R =~ 0.998. Some examples of other methods of clustering / looking at differences are included in the notebook.

Casting complex values to real discards the imaginary part

This is fine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lostruct-py-0.0.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

lostruct_py-0.0.3-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file lostruct-py-0.0.3.tar.gz.

File metadata

  • Download URL: lostruct-py-0.0.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for lostruct-py-0.0.3.tar.gz
Algorithm Hash digest
SHA256 249c815f875ebf5510f6485936cf95c664d46533cca01c61b8151459dabf8571
MD5 b1091992ffd28b81fe6d5cffbc28ca8d
BLAKE2b-256 740b0000f5f016632b5be2de38e43155a9e5afe98bcdc49ec2eaf9ed526466b9

See more details on using hashes here.

File details

Details for the file lostruct_py-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: lostruct_py-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for lostruct_py-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6d273601612ae1a2cafff289e073241d9faa13a79257e67ac543ddba4a6d39c2
MD5 f31ea19dc14989c275f42429b257b7ac
BLAKE2b-256 e63323faff851f0c6f718fb1c80fabb414645a8f89e3b9bbb606252cdba196e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page