An automated cell type annotation algorithm for unmatched spatial transcriptomics data
Project description
SANNO
The official implementation for "SANNO".
Table of Contents
Datasets
We provide preprocessed datasets for easy reproduction.
Download datasets from: Dataset Link
Installation
To use {Project Name}, follow these steps:
- Create a conda environment:
conda create -n {SANNO} python=3.7 conda activate {SANNO} - Install dependencies:
pip install -r requirements.txt - Install PYG and Pytorch according to the CUDA version, take torch-1.13.1+cu117 (Ubuntu 20.04.4 LTS) as an example:
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia pip install torch_geometric==2.3.0 # must be this version pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
Usage
Data Preprocessing
In order to run SANNO, we need to first create anndata from the raw data.
We require two types of datasets for this project: reference data and query data. Both datasets should be provided in .h5ad format, with cells stored in obs and genes/features stored in var.
Reference Data
- Format:
.h5ad - Content:
obs: Cell metadata, including a mandatorycell_typecolumn indicating the true cell type labels.var: Gene/feature metadata.obsm: Spatial coordinates stored under the keypos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).
Query Data
- Format:
.h5ad - Content:
obs: Cell metadata (cell type labels are not required).var: Gene/feature metadata.obsm: Spatial coordinates stored under the keypos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).
Cell Type Annotation
The processed data are used as input to SANNO and a reference genome is provided to extract the embedding and anootation incorporating reference Spatial Transcriptomics information:
cd SANNO/SANNO
python main_xy_adj.py --gpu_index 3 # GPU index
--type st2st \ # project type
--dataset Project name \ # project name
--train_dataset path/to/train_adata.h5ad \ # reference data
--test_dataset path/to/test_adata.h5ad \ # query data
--log log \ # log path
The project type must be selected based on the nature of the reference and query datasets. The following modes are supported:
st2st– For cases where both the reference and query datasets are spatial transcriptomics.st2sc– For cases where the reference dataset is spatial transcriptomics, and the query dataset is single-cell transcriptomics.sc2sc– For cases where both the reference and query datasets are single-cell transcriptomics.
Running the above command will generate three output files in the output path:
acc.csv: Contains the overall accuracy of the query data and SANNO predictions.embedding.h5ad: An AnnData file storing the embeddings extracted by SANNO.Reports: A set of logs recorded during the training process.
Tutorial 教程
Tutorial 1: Cell annotations within samples (HubMap CL A & HubMap CL B)
- Install the required environment according to Installation.
- Download the datasets from HubMap CL.
- Preprocess the datasets according to the Data Preprocessing standards.
- For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.
Tutorial 1: Cell annotations cross samples (Tonsil & BE)
- Install the required environment according to Installation.
- Download the datasets from Tonsil_BE.
- Preprocess the datasets according to the Data Preprocessing standards.
- For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.
Citation
If you use SANNO in your research, please cite:
@article{yourcitation,
title={{Your Paper Title}},
author={Your Name, Coauthor Name},
journal={Journal Name},
volume={00},
pages={1--10},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file SANNO-0.1.2.tar.gz.
File metadata
- Download URL: SANNO-0.1.2.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e4732735a1464da440323fb56358d5ebb6299aa21f48ce2726645824b1d2308
|
|
| MD5 |
3b3d10f6308d604d715ea24a3e30a3c2
|
|
| BLAKE2b-256 |
385b4dc3c8af23ade1703f9378ba9fac59db1e222f5286629150867ea61f29cc
|
File details
Details for the file SANNO-0.1.2-py3-none-any.whl.
File metadata
- Download URL: SANNO-0.1.2-py3-none-any.whl
- Upload date:
- Size: 3.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d30cb0d553de91576f3cad95e68c265dcbb19c8c32a2758e9cc86e1faf242974
|
|
| MD5 |
f48dfe59554637523e3d30a15a04f5b9
|
|
| BLAKE2b-256 |
fc8a4641336865d8602c4b6c355797800c09b53a532115df2747ec159694b48f
|