SPCAncestry: A Python package for inferring population ancestry.
Project description
SPCAncestry
A Python package for supervised global population ancestry inference from SNP data using stacking
Installation
SPCAncestry will soon be available through PyPI and installable using the following command:
pip3 install spcancestry
Usage
SPCAncestry assumes the user has SNP training data, in either PLINK, VCF, or Hail MatrixTable, and reliable global population ancestry labels for all the samples in the training data. We compute PCs using the training data, then these PCs (Xs) together with the population labels (Y) are used to train the model. The test SNP data is projected onto the PCs computed using the training data, and then we use the trained model to infer population ancestry. Below are the steps on how you can use the HGDP1KG data, provided with this package, for ancestry inference.
- Read the training and test datasets
import spcancestry
path = '/path/to/spcancestry/test_data/hgdp1kg'
inref_mt = spcancestry.Read(file=f'{path}/hgdp1kg_truth.bed', qc=False).as_matrixtable()
input_mt = spcancestry.Read(file=f'{path}/hgdp1kg_unknown.fam', qc=False).as_matrixtable()
- Intersect the two datasets, compute PCs using training data, and project test data onto training PC space
scores_df, colnames = spcancestry.PCProject(ref_mt=inref_mt, data_mt=input_mt,
ref_info=f'{path}hgdp_1kg_truth_labels.txt').run_pca_projection()
- Infer global population ancestry using spcancestry stacking
spcancestry_infered = spcancestry.infer_ancestry(scores_df, colnames)
Copyright and License
SPCAncestry is generously distributed under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spcancestry-0.1.0.tar.gz.
File metadata
- Download URL: spcancestry-0.1.0.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
198595f108bfb3c538873e4d18347451eb1992d98e113653b119bf8fffab163f
|
|
| MD5 |
00b20cd520c94a1403e5497dd5818fde
|
|
| BLAKE2b-256 |
fd324dbf00ecf445ff41ae6d9c03313a0e184a3e329e2ae7a3ef067e8b35e3c7
|
File details
Details for the file spcancestry-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spcancestry-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
899a68fe15441d2b4d5b3f12cc8d90f9f0f42a50d843bb259a825b56804f9e6a
|
|
| MD5 |
9956ba1afdd53ce6608759ee3510e389
|
|
| BLAKE2b-256 |
001f4cf81868043061ea9b06448165ba52e9acb3c3c91b769ad1348c79835714
|