Skip to main content

Interactive visualization of distortion in nonlinear embeddings.

Project description

Distortions

The distortions package gives functions to compute and visualize the distortions that are introduced by nonlinear dimensionality reduction algorithms. It is designed to wrap arbitrary embedding methods and builds on the distortion estimation routines from the megaman package. The resulting visualizations can let you interactively query properties that are not well preserved by the embedding. For example, the image below shows us selecting distances that are larger in the embedding space compared to the original data space. This allows us to describe the large-scale distortions induced by the embedding. For example, we can that some of the T cells have many neighbors with monocytes (to run this yourself, see the PBMC Atlas article).

Alternatively, we can study how the embedding warps distances more locally. Each ellipse in the figure below represents the way in which distances in the original data manifold are warped. By hovering over different regions of the map, we invert the warping in the region surrounding the mouse. For example, this shows that the T cells near the top and bottom of the T cell cluster are in fact more distant from each othoer than the static embedding would suggest.

Quickstart

You can install the package using:

python -m pip install distortions

Here's a small example on a UMAP applied to a simulated AnnData object. First we generate some random data and embeddings.

import anndata as ad
import scanpy as sc
import numpy as np

adata = ad.AnnData(np.random.poisson(2, size=(100, 5)))
sc.pp.neighbors(adata, n_neighbors=15)
sc.tl.umap(adata)

Next we estimate the local distortions and bind the relevant ellipse information to our embeddings. The Geometry object comes from the megaman package and gives ways of representing the intrinsic geometry of a manifold.

from distortions.geometry import Geometry, bind_metric, local_distortions, neighborhoods

geom = Geometry(affinity_kwds={"radius": 2}, adjacency_kwds={"n_neighbors": 15})
_, Hvv, Hs = local_distortions(adata.obsm["X_umap"], adata.X, geom)
embedding = bind_metric(adata.obsm["X_umap"], Hvv, Hs)

Now we can make the visualization.

from distortions.visualization import dplot

N = neighborhoods(adata, 1)
dplot(embedding)\
    .mapping(x="embedding_0", y="embedding_1")\
    .inter_edge_link(N=N)\
    .geom_ellipse()

At a high level, the main functions exported by this package are:

  • local_distortions: Estimate the local distortion associated with each sample.
  • neighborhoods: Identify neighborhoods that have been fragmented by the embedding method. These are sets of points that had been close together in the original space but which are spread far apart in the embedding.
  • dplot: Initialize a distortion plot object. Different encodings and interactions can be layered on top of this initial call.

Each dplot object has a few static (geom) and interactive (inter) layers that we can then assemble to create a distortion plot.

  • geom_ellipse: Draw an ellipse layer that encodes the local distortion associated with each sample.
  • geom_hair: Draw an line segment layer that encodes the local distortion associated with each sample. It's visually more compact than geom_ellipse, at the cost of only showing the ratio between ellipse axes lengths.
  • inter_isometry: Interactively isometrize from the region surrounding the mouse. This reduces the distortion around the mouse position, at the potential cost of increasing distortion globally.
  • inter_edge_link: Highlight distorted neighborhoods. This expects the output of neighborhoods as input. Hovering over one distorted neighborhood reveals all the edges that it's made up of.
  • inter_boxplot: Allow selection of outlying edges which have either much larger or smaller embedding distance relative to their original distance.

The full function reference can be found here. You can find more realistic examples applying the package in the articles listed at the side of this page.

Help

You can reach us by creating an Issue in the package repository or sending an email to ksankaran@wisc.edu. We appreciate your trying out the package and will try our best to reply promptly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distortions-0.0.5.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distortions-0.0.5-py3-none-any.whl (113.1 kB view details)

Uploaded Python 3

File details

Details for the file distortions-0.0.5.tar.gz.

File metadata

  • Download URL: distortions-0.0.5.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for distortions-0.0.5.tar.gz
Algorithm Hash digest
SHA256 12c03fab5243f92e3acdb7ce26aae89daf2b8b81da572b7191483f6572fe6f5b
MD5 71233318bdf4305f74d01a08f6efc503
BLAKE2b-256 ef3ba70337e56bb25b4f6d64556b6efe1467ab4a70f11bb640210978c07c3b33

See more details on using hashes here.

File details

Details for the file distortions-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: distortions-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 113.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for distortions-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 20a180109418f48e770371a2efffd3b6e61427459f120dcc2d9d5ab19f70d1d6
MD5 9b6a20543eedaaff7cdf3371f7657b88
BLAKE2b-256 5066a11c10fc1e457294d88a4a552247a043b523aac6858a73e86fcae0ad6e18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page