Skip to main content

An automated cell type annotation algorithm for unmatched spatial transcriptomics data

Project description

SANNO

The official implementation for "SANNO".

Table of Contents


Datasets

We provide preprocessed datasets for easy reproduction.

Download datasets from: Dataset Link


Installation

To use {Project Name}, follow these steps:

  1. Create a conda environment:
    conda create -n {SANNO} python=3.7
    conda activate {SANNO}
    
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Install PYG and Pytorch according to the CUDA version, take torch-1.13.1+cu117 (Ubuntu 20.04.4 LTS) as an example:
    conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
    pip install torch_geometric==2.3.0 # must be this version
    pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
    

Usage

Data Preprocessing

In order to run SANNO, we need to first create anndata from the raw data.

We require two types of datasets for this project: reference data and query data. Both datasets should be provided in .h5ad format, with cells stored in obs and genes/features stored in var.

Reference Data

  • Format: .h5ad
  • Content:
    • obs: Cell metadata, including a mandatory cell_type column indicating the true cell type labels.
    • var: Gene/feature metadata.
    • obsm: Spatial coordinates stored under the key pos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).

Query Data

  • Format: .h5ad
  • Content:
    • obs: Cell metadata (cell type labels are not required).
    • var: Gene/feature metadata.
    • obsm: Spatial coordinates stored under the key pos, representing the relative positions of cells as a 2D numpy array (n_cells x 2).

Cell Type Annotation

The processed data are used as input to SANNO and a reference genome is provided to extract the embedding and anootation incorporating reference Spatial Transcriptomics information:

cd SANNO/SANNO

python main_xy_adj.py   --gpu_index 3 # GPU index
                        --type st2st \ # project type
                        --dataset Project name \ # project name
                        --train_dataset path/to/train_adata.h5ad \ # reference data
                        --test_dataset path/to/test_adata.h5ad \ # query data
                        --log log \ # log path

The project type must be selected based on the nature of the reference and query datasets. The following modes are supported:

  • st2st – For cases where both the reference and query datasets are spatial transcriptomics.
  • st2sc – For cases where the reference dataset is spatial transcriptomics, and the query dataset is single-cell transcriptomics.
  • sc2sc – For cases where both the reference and query datasets are single-cell transcriptomics.

Running the above command will generate three output files in the output path:

  • acc.csv: Contains the overall accuracy of the query data and SANNO predictions.
  • embedding.h5ad: An AnnData file storing the embeddings extracted by SANNO.
  • Reports: A set of logs recorded during the training process.

Tutorial 教程

Tutorial 1: Cell annotations within samples (HubMap CL A & HubMap CL B)

  1. Install the required environment according to Installation.
  2. Download the datasets from HubMap CL.
  3. Preprocess the datasets according to the Data Preprocessing standards.
  4. For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.

Tutorial 1: Cell annotations cross samples (Tonsil & BE)

  1. Install the required environment according to Installation.
  2. Download the datasets from Tonsil_BE.
  3. Preprocess the datasets according to the Data Preprocessing standards.
  4. For more detailed information, run the tutorial HubMap_CL_intra.ipynb for how to do data preprocessing and training.

Citation

If you use SANNO in your research, please cite:

@article{yourcitation,
  title={{Your Paper Title}},
  author={Your Name, Coauthor Name},
  journal={Journal Name},
  volume={00},
  pages={1--10},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SANNO-0.1.2.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SANNO-0.1.2-py3-none-any.whl (3.0 kB view details)

Uploaded Python 3

File details

Details for the file SANNO-0.1.2.tar.gz.

File metadata

  • Download URL: SANNO-0.1.2.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for SANNO-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4e4732735a1464da440323fb56358d5ebb6299aa21f48ce2726645824b1d2308
MD5 3b3d10f6308d604d715ea24a3e30a3c2
BLAKE2b-256 385b4dc3c8af23ade1703f9378ba9fac59db1e222f5286629150867ea61f29cc

See more details on using hashes here.

File details

Details for the file SANNO-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: SANNO-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for SANNO-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d30cb0d553de91576f3cad95e68c265dcbb19c8c32a2758e9cc86e1faf242974
MD5 f48dfe59554637523e3d30a15a04f5b9
BLAKE2b-256 fc8a4641336865d8602c4b6c355797800c09b53a532115df2747ec159694b48f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page