Skip to main content

A tool for evalauting single-cell embeddings using graph-based relations

Project description

scgraph-eval

Stand-alone scGraph package for evaluations. Originally written by Hanchen Wang (wang.hanchen@gene.com), Leskovec, Jure and Regev, Aviv.

A tool for evaluating single-cell embeddings using graph-based relationships. This package helps analyze the consistency of cell type relationships across different batches in single-cell data.

I modified the API for convenience of my use (1. pass AnnData directly, 2. specify obsm keys) following the MIT license.

Installation

# conda create -n scgraph python=3.10 # to create another conda environment if necessary
pip install scgraph-bench

Usage

Python API

from scgraph import scGraph

# Using a file path
scgraph = scGraph(
    adata_path="path/to/your/data.h5ad",   # Path to AnnData object
    batch_key="batch",                     # Column name for batch information
    label_key="cell_type",                 # Column name for cell type labels
    trim_rate=0.05,                        # Trim rate for robust mean calculation
    thres_batch=100,                       # Minimum number of cells per batch
    thres_celltype=10,                     # Minimum number of cells per cell type
)

# Run the analysis, return a pandas dataframe
results = scgraph.main()

# Save the results
results.to_csv("embedding_evaluation_results.csv")

You can also pass an AnnData object directly via adata, and use obsm_keys to select which embeddings to evaluate:

import scanpy as sc
from scgraph import scGraph

adata = sc.read("path/to/your/data.h5ad")

# Evaluate only specific embeddings
scgraph = scGraph(
    adata=adata,                           # Pass AnnData object directly
    batch_key="batch",
    label_key="cell_type",
    obsm_keys=["X_umap", "X_scVI"],       # Only evaluate specific embeddings
)

results = scgraph.main()

If obsm_keys is not specified, all embeddings in adata.obsm will be evaluated.

Command Line Interface

# Evaluate all embeddings
scgraph-bench --adata_path path/to/data.h5ad --batch_key batch --label_key cell_type --savename results

# Evaluate specific embeddings only
scgraph-bench --adata_path path/to/data.h5ad --obsm_keys X_umap X_scVI --savename results

Output

The package outputs comparison metrics between different embeddings:

  • Rank-PCA: Spearman correlation with PCA-based relationships
  • Corr-PCA: Pearson correlation with PCA-based relationships
  • Corr-Weighted: Weighted correlation considering distance-based importance

How It Works

  1. Build a PCA-based consensus reference: For each batch, the top 1000 highly variable genes (HVG) are selected, and PCA (10 components) is computed on these HVGs. Trimmed-mean centroids for each cell type are calculated in this PCA space, and pairwise distances between centroids are recorded. The per-batch distance matrices are then averaged across all batches to form a consensus reference that captures robust cell-type relationships.
  2. Evaluate embeddings against the consensus: For each embedding in adata.obsm (or those specified by obsm_keys), the same centroid and pairwise distance procedure is applied. The resulting distance matrix is compared to the PCA consensus via Spearman correlation (Rank-PCA), Pearson correlation (Corr-PCA), and distance-weighted Pearson correlation (Corr-Weighted).

Note: HVG selection and PCA are only used to build the consensus reference. The embeddings being evaluated are used as-is from adata.obsm.

Requirements

  • numpy
  • pandas
  • scanpy
  • tqdm
  • scipy

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, please cite:

@article{wang2024metric, title={Metric mirages in cell embeddings}, author={Wang, Hanchen and Leskovec, Jure and Regev, Aviv}, journal={BioRxiv}, pages={2024--04}, year={2024}, publisher={Cold Spring Harbor Laboratory} }

Contact

For questions and feedback:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scgraph_bench-0.1.5.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scgraph_bench-0.1.5-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file scgraph_bench-0.1.5.tar.gz.

File metadata

  • Download URL: scgraph_bench-0.1.5.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for scgraph_bench-0.1.5.tar.gz
Algorithm Hash digest
SHA256 044fdc5a5b4f40f24b3625c07e776be7abb69318934998efd7f3a31b022bd996
MD5 cd49a17068f70b7f7ce2765f4710316b
BLAKE2b-256 c01a5b890c88c7c22c33b548384fd3180c0d40b5054962ded247782da8df82a2

See more details on using hashes here.

File details

Details for the file scgraph_bench-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: scgraph_bench-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for scgraph_bench-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cf446a56fd1040712e7df1508047c3d3aa0bca8a9cebe61cb39832cea2b1a734
MD5 5f0b674cb595702f9c25ac46e3dc09cf
BLAKE2b-256 54724fec5b8f28eb0334cf6a5d2a608ae7d8b8ddd11481bde7c174caf26edd83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page