Skip to main content

Supervised OT with Global Ground Metric Learning

Project description

PyPI PyPI Downloads Docs CI

ggml-ot: Supervised Optimal Transport

Learning the Ground Metrics of Optimal Transport (OT) improves downstream applications, such as classification, clustering, trajectory inference and embeddings. Global Ground Metric Learning (GGML) learns low-dimensional subspaces that capture class relations between distributions under OT.

The focus of this package is to improve OT analyses on single-cell data in the Anndata format, allowing exchangeability with other tools from Scanpy and Scverse. It also supports generic data as numpy.ndarray.

Improve OT on Single-cell data

A common analysis on single-cell data is investigating genetic differences between patient groups (e.g. disease states). One approach is to capture these differences as OT distances, where each patient/sample is considered as a distribution of cells over the measured expressed/regulated genes. However, computing OT with common Ground Metrics, such as the Euclidean or Cosine distance, often fails to capture the relations between patient groups due to noise, biological fluctuations and disease-unrelated genetic differences, as shown below:

Optimal Transport between Patients [^1] with:

GGML Euclidean Ground Metric Cosine Ground Metric

GGML uses the patient groups as distribution labels to learn a suitable ground metric that captures the patient groups under OT. It is weakly supervised and does not require any cell type annotation or filtering, allowing to investigate group-related biological processes (e.g. disease mechanism) in the learned gene subspaces, across known and unknown cell subtypes.

This method was first introduced in Global Ground Metric Learning with Applications to scRNA data published at AISTATS2025.

Installation

The easiest way to install ggml-ot is from PyPI via pip:

pip install ggml-ot

Getting Started on AnnData

In this small example, we demonstrate how to train ggml-ot to perform Supervised Optimal Transport on AnnData. To use Cross-Validation, Hyperparameter Tuning and more, refer to the tutorials.

  import ggml_ot

  # Load Anndata from CELLxGENE
  id = "c1f6034b-7973-45e1-85e7-16933d0550bc.h5ad"
  adata = ggml_ot.data.load_cellxgene(id)

  # Setup Dataset from Anndata
  dataset = ggml_ot.from_anndata(adata, patient_col="sample", label_col="patient_group")

  # Train GGML on all patients
  dataset.train()

Clustermap and Embedding of Patients with OT distances using the learned ground metric.

The dataset contains the AnnData object in .adata. After training on the dataset, the AnnData object contains the learned gene subspace of the ground metric in .varm["W_ggml"] and the cells embedded in this subspace in .obsm["X_ggml"].

import scanpy as sc

# Access adata with learned metric
adata = dataset.adata

# Rank genes of learned components (adata.varm["W_ggml"])
ggml_ot.gene.ranking(adata,gene_symbols="feature_name")

# Show cells embedded in low-dimensional gene subspace (adata.obsm["X_ggml"])
sc.pl.embedding(adata,basis="X_ggml",color=["patient_group",'CDH19','STAB1'],
                gene_symbols="feature_name",use_raw=False,legend_loc='on data')

Ranking of loadings in learned gene subspace

Embedding of cells in learned gene subspace with 2 example marker genes from ranking W_ggml1

Citation

If you are using this package for you research and find it helpful, please use this reference:

Kühn, Damin, and Michael T. Schaub. "Global Ground Metric Learning with Applications to scRNA data." Proceedings of the 28th International Conference on Artificial Intelligence and Statistics. PMLR, 2025.

In BibTeX format:

@misc{kuehn2025ggml,
  title={Global Ground Metric Learning with Applications to scRNA data},
  author={Kühn, Damin and Schaub, Michael T.},
  booktitle = 	 {Proceedings of the 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3295--3303},
  year = 	 {2025},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/kuhn25a/kuhn25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/kuhn25a.html},
}

[^1]: Patient-level scRNA-seq dataset from Kuppe, Christoph, et al. "Spatial multi-omic map of human myocardial infarction." Nature (2022).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ggml_ot-0.9.9.tar.gz (46.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ggml_ot-0.9.9-py3-none-any.whl (58.0 kB view details)

Uploaded Python 3

File details

Details for the file ggml_ot-0.9.9.tar.gz.

File metadata

  • Download URL: ggml_ot-0.9.9.tar.gz
  • Upload date:
  • Size: 46.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ggml_ot-0.9.9.tar.gz
Algorithm Hash digest
SHA256 8be384c850d0a768c33ed7b98f54822f8c8243e9d11ec7532cabaf092482cb00
MD5 2ccb679010bc57e0855dc0282ef6fc50
BLAKE2b-256 69c72b5a86dde1af2ede1e3d79a9365d442ea19ba54bfa059fd6ae9ca131a8ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for ggml_ot-0.9.9.tar.gz:

Publisher: publish.yml on DaminK/ggml-ot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ggml_ot-0.9.9-py3-none-any.whl.

File metadata

  • Download URL: ggml_ot-0.9.9-py3-none-any.whl
  • Upload date:
  • Size: 58.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ggml_ot-0.9.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c3324ccc73915a004cd14174e772b9e469c3f9cbae02cdf6838fd56d8af45cb5
MD5 63c0d93032e62c8d7a03b7a2ddd52485
BLAKE2b-256 94ac2067f511bc4b4d4126014d1d30e63baca6693c6439b3b557b980cc951e8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ggml_ot-0.9.9-py3-none-any.whl:

Publisher: publish.yml on DaminK/ggml-ot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page