Skip to main content

Consensus prediction of cell type labels with popV

Project description

popV

Stars PyPI PopV Coverage Code Style Downloads Docs

PopV uses popular vote of a variety of cell-type transfer tools to classify cell-types in a query dataset based on a test dataset. Using this variety of algorithms, we compute the agreement between those algorithms and use this agreement to predict which cell-types are with a high likelihood the same cell-types observed in the reference.

Algorithms

Currently implemented algorithms are:

  • K-nearest neighbor classification after dataset integration with BBKNN
  • K-nearest neighbor classification after dataset integration with SCANORAMA
  • K-nearest neighbor classification after dataset integration with scVI
  • K-nearest neighbor classification after dataset integration with Harmony
  • Random forest classification
  • Support vector machine classification
  • XGboost classification
  • OnClass cell type classification
  • scANVI label transfer
  • Celltypist cell type classification

All algorithms are implemented as a class in popv/algorithms.

All algorithms that allow for pre-training are pre-trained. This excludes by design BBKNN, Harmony and SCANORAMA as all construct a new embedding space. To provide pretrained methods for BBKNN and Harmony, we use a nearest-neighbor index in PCA space and position query cells at the average position of the 5 nearest neighbors.

Pretrained models are stored on HuggingFace.

PopV has three levels of prediction complexities:

  • retrain: Will train all classifiers from scratch. For 50k cells, this takes up to an hour of computing time using a GPU.
  • inference: Uses pretrained classifiers to annotate query and reference cells and construct a joint embedding using all integration methods. For 50k cells, this takes up to half an hour of GPU time.
  • fast: Uses only methods with pretrained classifiers to annotate only query cells. For 50k cells, this takes 5 minutes without a GPU (without UMAP embedding).

Output

PopV will output a cell-type classification for each of the used classifiers, as well as the majority vote across all classifiers. Additionally, PopV uses the ontology to go through the full ontology descendants for the OnClass prediction (disabled in fast mode). This method will be further described when PopV is published. PopV also outputs a score that counts the number of classifiers agreeing on the PopV prediction. This can be seen as the certainty that the current prediction is correct for every single cell in the query data.

We found that disagreement of a single expert is still highly reliable, while disagreement of more than two classifiers signifies less reliable results. The aim of PopV is not to fully annotate a dataset but to highlight cells that may require further manual annotation. PopV also outputs UMAP embeddings of all integrated latent spaces if popv.settings.compute_embedding == True and computes certainties for every used classifier if popv.settings.return_probabilities == True.

Resources

  • Tutorials, API reference, and installation guides are available in the documentation.

Installation

We suggest using a package manager like conda or mamba to install the package. OnClass files for annotation based on Tabula sapiens are deposited in popv/resources/ontology. We use Cell Ontology as an ontology throughout our experiments. PopV will automatically look for the ontology in this folder. If you want to provide your user-edited ontology, our tutorials demonstrate how to generate the Natural Language Model used in OnClass for this user-defined ontology.

conda create -n yourenv python=3.11
conda activate yourenv
pip install popv

Example notebook

We provide an example notebook in Google Colab:

This notebook will guide you through annotating a dataset based on the annotated Tabula sapiens reference and demonstrates how to run annotation on your own query dataset. This notebook requires that all cells are annotated based on a cell ontology. We strongly encourage the use of a common cell ontology, see also Osumi-Sutherland et al. Using a cell ontology is a requirement to run OnClass as a prediction algorithm. Setting ontology to false, will disable this step and allows running popV without using a cell ontology.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

popv-0.6.0.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

popv-0.6.0-py3-none-any.whl (54.7 kB view details)

Uploaded Python 3

File details

Details for the file popv-0.6.0.tar.gz.

File metadata

  • Download URL: popv-0.6.0.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for popv-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c7e023ca47aaaafa3b5560f3f446da3025e6bdec62a8010557f6517d0e8a5605
MD5 14a58ddd7aa8bb6bbcbad725fd65f479
BLAKE2b-256 7f51dffa121b1f6dd8c8b4cd7dcc23295241210e59a6f077ce07c08dfe609189

See more details on using hashes here.

Provenance

The following attestation bundles were made for popv-0.6.0.tar.gz:

Publisher: release.yml on YosefLab/popV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file popv-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: popv-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 54.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for popv-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3477dad0e51408937a82e3d64dc51109d311a0a8eac8e2c863687831e1767556
MD5 fcd8f2f16606b51b592937c376a5943f
BLAKE2b-256 c5b2ed7c628c4ad3fc70926b3972f89e9da2b79dc1c5e9489899043d858f724e

See more details on using hashes here.

Provenance

The following attestation bundles were made for popv-0.6.0-py3-none-any.whl:

Publisher: release.yml on YosefLab/popV

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page