Skip to main content

Project (ancient) human genomes onto pre-computed standard PCA

Project description

projectPCA

Project genomes onto pre-computed principal components widely used in ancient DNA. Enables fast analysis without re-computing the principal components. The software accepts ancient DNA data in eigenstrat or PLINK format as input. No modern samples are required, as the packages include the pre-computed PCA weights and PC coordinates for relevant modern samples (based on publicly available Human Origin array data).

Installation

The package projectPCAis available as a Python package via pip. To install, simply run a version of:

python3 -m pip install projectPCA

List of available PCAs

As of early 2026, two pre-computed PCAs are officially bundled into projectPCA. The bracket denotes the code you can use for all this PCA.

  • HO Westeurasia (HO) Standard Western Eurasian PCA, which is widely used in aDNA studies. PC1 corresponds to West-East, and PC2 to North-South.

  • HO Eurasian (EUAS) Standard whole-Eurasian PCA, widely used in aDNA studies. Excellent to resolve West versus East Asian ancestry (on PC1). PC2 generally corresponds to North-South.

Usage

Project single Samples

To project onto a PCA, the key function is project_eigenstrat. To import it and run a single sample, use:

from projectPCA.run import project_eigenstrat

project_eigenstrat(es_path="/mnt/archgen/Autorun_eager/eager_outputs/TF/SUA/SUA002/genotyping/pileupcaller.double",
                   pca="HO", es_type="default")

This function also returns the dataframe with PCA coordinates. Note that the input path is the path of the eigenstrat files up to .geno but without the suffix.

The keyword pca denotes which PCA type to project onto (see above).

If you want to save the figure, you can add the keyword fig_path="". If this string is filled in, the program saves the resulting figure there. If the path ends in .html, the figure is saved as an interactive plot, where you can hover over the individuals to see their labels (both ancient and modern reference samples). Otherwise, the standard matplotlib libraries are used to plot and save the figure (including in .png or .pdf format, based on the extension you provide).

project_eigenstrat(es_path="/mnt/archgen/Autorun_eager/eager_outputs/TF/SUA/SUA002/genotyping/pileupcaller.double",
                   pca="EUAS", es_type="unpacked_fast", plot_bgrd_c=False, fig_path='./figs/SUA002_EUAS.html')

Project multiple samples

It is also possible to project multiple samples. For this, you can use the keyword iids=[]. If the keyword is empty (the default), all samples in a file are projected and plotted. If you specify a list of individuals, only individuals with these IDs are projected.

Project PLINK files

To project PLINK files, you can use the keyword es_type="plink", and provide the path of the PLINK file up to the suffix:

project_eigenstrat(es_path="/mnt/archgen/users/hringbauer/git/EPIDEMIC/output/plink/bd_ptn_335",
                   pca="EUAS", es_type="plink", iids=[],
                   plot_bgrd_c=False, verbose=True, flip=True, 
                   fig_path='/mnt/archgen/users/hringbauer/git/projectPCA/figs/ptn335PLINK_EUAS.html')

@Harald Ringbauer, 2026

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

projectpca-0.3.2.tar.gz (61.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

projectpca-0.3.2-py3-none-any.whl (61.9 MB view details)

Uploaded Python 3

File details

Details for the file projectpca-0.3.2.tar.gz.

File metadata

  • Download URL: projectpca-0.3.2.tar.gz
  • Upload date:
  • Size: 61.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for projectpca-0.3.2.tar.gz
Algorithm Hash digest
SHA256 07313980596a067448d7680812b7b125dd102e23d83b9e5d4da089ac354fd0bd
MD5 027d5b329bdf83d716234239b87347e7
BLAKE2b-256 e4aab7594d9d06934a6251ed3c8b4263ddb87e813219654d128317b9a249f843

See more details on using hashes here.

File details

Details for the file projectpca-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: projectpca-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 61.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for projectpca-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a6369a8b2823f8d511a653f295bd7ce32463d049a48e5f0d596cbf9e23211966
MD5 24568fc14c34e18dfcba81f41ae9fe21
BLAKE2b-256 d906488e23377a17d902ed398dac0f9b21d03bd9481489ec2af925f58e0dd33d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page