Skip to main content

A hypothesis test for velocity embeddings.

Project description

https://app.readthedocs.org/projects/velocity-hypothesis-test/badge/?version=latest&style=flat https://github.com/mackelab/velocity-hypothesis-test/actions/workflows/test.yml/badge.svg?branch=main

Velotest is a hypothesis test for how well a 2D embedding of positional and velocity data represents the original high dimensional data. It’s purpose is to help practitioners using 2D embeddings of single cell RNA sequencing data with RNA velocity decide which 2D velocity vectors are faithfully representing the high-dimensional data.

Installation

You can simply install the package via pip:

pip install velotest

If you want to change bits of the code, install it in editable mode:

pip install -e "."

In both cases you’ll need additional dependencies to build the docs, run tests, or reproduce the figures from the paper, which you can install via the extras docs, dev, or experiment, either separately or in combination. For example, to install the docs extra, run pip install velotest[docs], or to install both the docs and dev extras, run pip install velotest[docs,dev]. Similarly, if you installed in editable mode, you can run pip install -e ".[docs]".

Usage

If you have the embeddings and original data as individual arrays/tensor (see below for use with an anndata object), you can use our general interface:

from velotest.hypothesis_testing import run_hypothesis_test

uncorrected_p_values, h0_rejected, _, _, _ = run_hypothesis_test(high_d_position, high_d_velocity, low_d_position, low_d_velocity_position)

where low_d_velocity_position is the tip’s position of the 2D velocity vector, NOT the velocity vector originating in low_d_position.

An application on single-cell sequencing data (runnable notebook: notebooks/demo.ipynb) could look like this (following scvelo’s tutorial):

from velotest.hypothesis_testing import run_hypothesis_test_on
import scvelo

adata = scvelo.datasets.pancreas()
scvelo.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scvelo.pp.moments(adata, n_pcs=30, n_neighbors=30)

# Compute velocity
scvelo.tl.velocity(adata)

# Compute 2D embedding of velocity vectors
scvelo.tl.velocity_graph(adata)
scvelo.pl.velocity_embedding(adata)

# Run test
uncorrected_p_values, h0_rejected, _ = run_hypothesis_test_on(adata)

For plotting, you can use the plotting module. Have a look at notebooks/demo.ipynb for an example. Refer to Read the Docs for a more detailed API documentation.

Details

Next, we will briefly summarize how the test works, for details see our paper. The tests tries to assess how well the 2D velocity vectors represent the high-dimensional velocity vectors. We quantify this by computing the mean cosine similarity between the high-dimensional velocity vector and the difference vectors to a set of neighbors in the high-dimensional space. For a data point \(i\), we use the mean cosine similarity between the velocity \(v_i\) and the difference vector \(x_j-x_i\) for all \(x_j\) in a set of neighbors of \(\tilde{x}_i\) as the test statistic. This set of neighbors is chosen based on the points the velocity \(\tilde{v}_i\) points to in 2D. \(\tilde{v}_i\) and \(\tilde{x}_i\) are the 2D embeddings of \(v_i\) and \(x_i\), respectively.

The null hypothesis is that the visualised 2D velocity vector is no more aligned with the high-dimensional velocity than a visually distinct random 2D direction. It is rejected if the number of random neighborhoods with a higher statistic as the statistic from the velocity-based neighborhood exceeds the level we would expect for a certain significance level.

It was originally developed for the analysis of single cell RNA sequencing data, but can be applied to any application with positional and velocity data.

Reproducing plots from paper

Make sure that you have the experiment extra installed (see Installation section above).

Then, you can reproduce all figures by simply running make_all_figures.py in the experiments folder:

cd experiments
python make_all_figures.py --multirun=dataset=pancreas_stochastic,pancreas_dynamical,dentateyrus,bonemarrow,covid,gastrulation_erythroid,nystroem,developing_mouse_brain,organogenesis,veloviz

This will create a fig folder in the experiments folder with all figures based on the configuration in configs/. This uses hydra to manage the configurations, so you can also modify individual configurations using the command line with python make_all_figures.py dataset=pancreas_stochastic dataset.number_neighbors_to_sample_from=300.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

velotest-0.0.1.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

velotest-0.0.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file velotest-0.0.1.tar.gz.

File metadata

  • Download URL: velotest-0.0.1.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for velotest-0.0.1.tar.gz
Algorithm Hash digest
SHA256 636d4892f77f560f6258574532ea9977333ddbe329051b51855bd285fb9fa277
MD5 2596310f86e361bfaf471975723721f6
BLAKE2b-256 6e4ee5f964f951b75a1e506bf1abfbb87e361e12c4afc1cbc8fe2a2f33f02a72

See more details on using hashes here.

File details

Details for the file velotest-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: velotest-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for velotest-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3e737de25c1b81610376a1e36d9b1176c1ae4bc51ba22858ce7ba58225748f84
MD5 f5645819a3868495d30e01ba4fc324bf
BLAKE2b-256 66f94425f950baca88d7650ce55a2c90dbd3b62b5dc8732ba98f2b4001e50b37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page