Re-implementation of lostruct in Python, used to compare local population structure across populations.
Project description
lostruct-py
This is a reimplementation of lostruct from the original code: Lostruct. Please cite the original paper
Demonstration / How to use
Please see the Example Notebook
Installation
Lostruct-py is available on PyPi
pip install lostruct-py
is the easiest way to get started.
Citing
Please use our DOI to cite this specific project. Also please cite the original Lostruct paper and CyVCF2.
DOI: 10.5281/zenodo.3997086
Original Lostruct Paper
Please cite the original lostruct paper:
Li, Han, and Peter Ralph. "Local PCA shows how the effect of population structure differs along the genome." Genetics 211.1 (2019): 289-304.
CyVCF2
This paper also uses cyvcf2 for fast VCF processing and should be cited:
Brent S Pedersen, Aaron R Quinlan, cyvcf2: fast, flexible variant analysis with Python, Bioinformatics, Volume 33, Issue 12, 15 June 2017, Pages 1867–1869, https://doi.org/10.1093/bioinformatics/btx057
Changes from Lostruct R package
Please note numpy and R are different when it comes to row-major vs. column-major. Essentially, many things in the python version will be transposed from R.
Requirements
Python >= 3.6 (may work with older versions). Developed on Python 3.8.5
- numba
- numpy
- pandas
- scipy
- skbio
- sklearn
- cyvcf2
CyVCF2 requires zlib-dev, libbz2-dev, libcurl-dev, liblzma-dev, and probably others
Easiest to install all of these through conda
Correlation Data
Used Medicago HapMap sister taxa chromsome 1, processed, and run with LoStruct
Data
bcftools annotate chr1-filtered-set-2014Apr15.bcf -x INFO,FORMAT | bcftools view -a -i 'F_MISSING<=0.2' | bcftools view -q 0.05 -q 0.95 -m2 -M2 -a -Oz -o chr1-filtered.vcf.gz
Lostruct Processing
Rscript run_lostruct.R -t SNP -s 95 -k 10 -m 10 -i data/
This generates the mds_coords.tsv that is used in the correlation comparison.
FAQ / Notes
Future
Currently the end-user is expected to save the outputs. But could be good to save it in a similar way to lostruct R-code. Please open an issue if you need this.
PCA, MDS, PCoA
PCoA returns the same results as lostruct's MDS implementation (cmdscale). In the example Jupyter notebook you can see the correlation is R =~ 0.998. Some examples of other methods of clustering / looking at differences are included in the notebook.
Casting complex values to real discards the imaginary part
This is fine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lostruct-py-0.0.3.tar.gz
.
File metadata
- Download URL: lostruct-py-0.0.3.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 249c815f875ebf5510f6485936cf95c664d46533cca01c61b8151459dabf8571 |
|
MD5 | b1091992ffd28b81fe6d5cffbc28ca8d |
|
BLAKE2b-256 | 740b0000f5f016632b5be2de38e43155a9e5afe98bcdc49ec2eaf9ed526466b9 |
File details
Details for the file lostruct_py-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: lostruct_py-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d273601612ae1a2cafff289e073241d9faa13a79257e67ac543ddba4a6d39c2 |
|
MD5 | f31ea19dc14989c275f42429b257b7ac |
|
BLAKE2b-256 | e63323faff851f0c6f718fb1c80fabb414645a8f89e3b9bbb606252cdba196e7 |