Skip to main content

SPCAncestry: A Python package for inferring population ancestry.

Project description

SPCAncestry

A Python package for supervised global population ancestry inference from SNP data using stacking

Installation

SPCAncestry will soon be available through PyPI and installable using the following command:

pip3 install spcancestry

Usage

SPCAncestry assumes the user has SNP training data, in either PLINK, VCF, or Hail MatrixTable, and reliable global population ancestry labels for all the samples in the training data. We compute PCs using the training data, then these PCs (Xs) together with the population labels (Y) are used to train the model. The test SNP data is projected onto the PCs computed using the training data, and then we use the trained model to infer population ancestry. Below are the steps on how you can use the HGDP1KG data, provided with this package, for ancestry inference.

  1. Read the training and test datasets
import spcancestry

path = '/path/to/spcancestry/test_data/hgdp1kg'
inref_mt = spcancestry.Read(file=f'{path}/hgdp1kg_truth.bed', qc=False).as_matrixtable()
input_mt = spcancestry.Read(file=f'{path}/hgdp1kg_unknown.fam', qc=False).as_matrixtable()
  1. Intersect the two datasets, compute PCs using training data, and project test data onto training PC space
scores_df, colnames = spcancestry.PCProject(ref_mt=inref_mt, data_mt=input_mt,
                                            ref_info=f'{path}hgdp_1kg_truth_labels.txt').run_pca_projection()
  1. Infer global population ancestry using spcancestry stacking
spcancestry_infered = spcancestry.infer_ancestry(scores_df, colnames)

Copyright and License

SPCAncestry is generously distributed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spcancestry-0.1.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

spcancestry-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file spcancestry-0.1.0.tar.gz.

File metadata

  • Download URL: spcancestry-0.1.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.9

File hashes

Hashes for spcancestry-0.1.0.tar.gz
Algorithm Hash digest
SHA256 198595f108bfb3c538873e4d18347451eb1992d98e113653b119bf8fffab163f
MD5 00b20cd520c94a1403e5497dd5818fde
BLAKE2b-256 fd324dbf00ecf445ff41ae6d9c03313a0e184a3e329e2ae7a3ef067e8b35e3c7

See more details on using hashes here.

File details

Details for the file spcancestry-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: spcancestry-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.9

File hashes

Hashes for spcancestry-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 899a68fe15441d2b4d5b3f12cc8d90f9f0f42a50d843bb259a825b56804f9e6a
MD5 9956ba1afdd53ce6608759ee3510e389
BLAKE2b-256 001f4cf81868043061ea9b06448165ba52e9acb3c3c91b769ad1348c79835714

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page