A tool for evalauting single-cell embeddings using graph-based relations
Project description
scgraph-eval
Stand-alone scGraph package for evaluations. Originally written by Hanchen Wang (wang.hanchen@gene.com), Leskovec, Jure and Regev, Aviv.
A tool for evaluating single-cell embeddings using graph-based relationships. This package helps analyze the consistency of cell type relationships across different batches in single-cell data.
I modified the API for convenience of my use (1. pass AnnData directly, 2. specify obsm keys) following the MIT license.
Installation
# conda create -n scgraph python=3.10 # to create another conda environment if necessary
pip install scgraph-bench
Usage
Python API
from scgraph import scGraph
# Using a file path
scgraph = scGraph(
adata_path="path/to/your/data.h5ad", # Path to AnnData object
batch_key="batch", # Column name for batch information
label_key="cell_type", # Column name for cell type labels
trim_rate=0.05, # Trim rate for robust mean calculation
thres_batch=100, # Minimum number of cells per batch
thres_celltype=10, # Minimum number of cells per cell type
)
# Run the analysis, return a pandas dataframe
results = scgraph.main()
# Save the results
results.to_csv("embedding_evaluation_results.csv")
You can also pass an AnnData object directly via adata, and use obsm_keys to select which embeddings to evaluate:
import scanpy as sc
from scgraph import scGraph
adata = sc.read("path/to/your/data.h5ad")
# Evaluate only specific embeddings
scgraph = scGraph(
adata=adata, # Pass AnnData object directly
batch_key="batch",
label_key="cell_type",
obsm_keys=["X_umap", "X_scVI"], # Only evaluate specific embeddings
)
results = scgraph.main()
If obsm_keys is not specified, all embeddings in adata.obsm will be evaluated.
Command Line Interface
# Evaluate all embeddings
scgraph-bench --adata_path path/to/data.h5ad --batch_key batch --label_key cell_type --savename results
# Evaluate specific embeddings only
scgraph-bench --adata_path path/to/data.h5ad --obsm_keys X_umap X_scVI --savename results
Output
The package outputs comparison metrics between different embeddings:
- Rank-PCA: Spearman correlation with PCA-based relationships
- Corr-PCA: Pearson correlation with PCA-based relationships
- Corr-Weighted: Weighted correlation considering distance-based importance
How It Works
- Build a PCA-based consensus reference: For each batch, the top 1000 highly variable genes (HVG) are selected, and PCA (10 components) is computed on these HVGs. Trimmed-mean centroids for each cell type are calculated in this PCA space, and pairwise distances between centroids are recorded. The per-batch distance matrices are then averaged across all batches to form a consensus reference that captures robust cell-type relationships.
- Evaluate embeddings against the consensus: For each embedding in
adata.obsm(or those specified byobsm_keys), the same centroid and pairwise distance procedure is applied. The resulting distance matrix is compared to the PCA consensus via Spearman correlation (Rank-PCA), Pearson correlation (Corr-PCA), and distance-weighted Pearson correlation (Corr-Weighted).
Note: HVG selection and PCA are only used to build the consensus reference. The embeddings being evaluated are used as-is from adata.obsm.
Requirements
- numpy
- pandas
- scanpy
- tqdm
- scipy
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this package in your research, please cite:
@article{wang2024metric, title={Metric mirages in cell embeddings}, author={Wang, Hanchen and Leskovec, Jure and Regev, Aviv}, journal={BioRxiv}, pages={2024--04}, year={2024}, publisher={Cold Spring Harbor Laboratory} }
Contact
For questions and feedback:
- Hanchen Wang
- Email: hanchen.wang.sc@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scgraph_bench-0.1.5.tar.gz.
File metadata
- Download URL: scgraph_bench-0.1.5.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
044fdc5a5b4f40f24b3625c07e776be7abb69318934998efd7f3a31b022bd996
|
|
| MD5 |
cd49a17068f70b7f7ce2765f4710316b
|
|
| BLAKE2b-256 |
c01a5b890c88c7c22c33b548384fd3180c0d40b5054962ded247782da8df82a2
|
File details
Details for the file scgraph_bench-0.1.5-py3-none-any.whl.
File metadata
- Download URL: scgraph_bench-0.1.5-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf446a56fd1040712e7df1508047c3d3aa0bca8a9cebe61cb39832cea2b1a734
|
|
| MD5 |
5f0b674cb595702f9c25ac46e3dc09cf
|
|
| BLAKE2b-256 |
54724fec5b8f28eb0334cf6a5d2a608ae7d8b8ddd11481bde7c174caf26edd83
|