Skip to main content

A Method for Mapping Single-cell and Spatial Transcriptomics Data with Transfer Learning

Project description

Welcome!

1. Introduction and installation

STEM is a tool for building single-cell level spatial transcriptomic landscapes using SC data with ST data. STEM extracts the spatial information from the gene expressions and eliminates the domain gap between spatial transcriptomics and single-cell RNA-seq data.

Installation

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia ## more info: https://pytorch.org/get-started/locally/
conda install -c conda-forge scanpy python-igraph leidenalg
conda install seaborn

Since STEM is a light model, you can use the code from the STEM folder directly. But you can also install STEM from PYPI:pip install STEM. To verify the installation, run python test.py

Then we will demonstrate the workflow based on the mouse embryo data as part of our paper Figure 2.

%pylab inline
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
import scanpy as sc
import pandas as pd
import torch
import scipy
import time
from STEM.model import *
from STEM.utils import *

2. Load and Preprocess data

First, we load the simulated ST data as ST data and the raw SeqFISH data as SC data. Then we normalized and log-scaled these data.

scdata = pd.read_csv('./data/mousedata_2020/E1z2/simu_sc_counts.csv',index_col=0)
scdata = scdata.T
stdata = pd.read_csv('data/mousedata_2020/E1z2/simu_st_counts.csv',index_col=0)
stdata = stdata.T
stgtcelltype = pd.read_csv('./data/mousedata_2020/E1z2/simu_st_celltype.csv',index_col=0)
spcoor = pd.read_csv('./data/mousedata_2020/E1z2/simu_st_metadata.csv',index_col=0)
scmetadata = pd.read_csv('./data/mousedata_2020/E1z2/metadata.csv',index_col=0)

adata = sc.AnnData(scdata,obs=scmetadata)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
scdata = pd.DataFrame(adata.X,index=adata.obs_names,columns=adata.var_names)
stadata = sc.AnnData(stdata)
sc.pp.normalize_total(stadata)
sc.pp.log1p(stadata)
stdata = pd.DataFrame(stadata.X,index=stadata.obs_names,columns=stadata.var_names)

adata.obsm['spatial'] = scmetadata[['x_global','y_global']].values
stadata.obsm['spatial'] = spcoor

Next we calculate the ratio between the median total counts value of SC and ST data as the dropout rate.

sc.pp.calculate_qc_metrics(adata,percent_top=None, log1p=False, inplace=True)
adata.obs['n_genes_by_counts'].median()

sc.pp.calculate_qc_metrics(stadata,percent_top=None, log1p=False, inplace=True)
stadata.obs['n_genes_by_counts'].median()

dp = 1- adata.obs['n_genes_by_counts'].median()/stadata.obs['n_genes_by_counts'].median()
#0.5836734693877551

3. Train the STEM model

We first config the STEM model and then train it. Empirically we found by setting the sigma as half of the ST spot adjacent distance, STEM achieves the best performance.

class setting( object ):
    pass
seed_all(2022)
opt= setting()
setattr(opt, 'device', 'cuda:0')
setattr(opt, 'outf', 'log/Mouse_E1z2')
setattr(opt, 'n_genes', 351)
setattr(opt, 'no_bn', False)
setattr(opt, 'lr', 0.002)
setattr(opt, 'sigma', 3)
setattr(opt, 'alpha', 0.8)
setattr(opt, 'verbose', True)
setattr(opt, 'mmdbatch', 1000)
setattr(opt, 'dp', dp)

testmodel = SOmodel(opt)
testmodel.togpu()
loss_curve = testmodel.train_wholedata(400,torch.tensor(scdata.values).float(),torch.tensor(stdata.values).float(),torch.tensor(spcoor.values).float())

The loss curve will be like this: loss

4. Get embeddings and reconstruct spatial adjacency

We first get the embeddings and build the mapping matrix We first get the embeddings and build the mapping matrix

testmodel.modeleval()
scembedding = testmodel.netE(torch.tensor(scdata.values,dtype=torch.float32).cuda())
stembedding = testmodel.netE(torch.tensor(stdata.values,dtype=torch.float32).cuda())
netst2sc = F.softmax(stembedding.mm(scembedding.t()),dim=1).detach().cpu().numpy()
netsc2st = F.softmax(scembedding.mm(stembedding.t()),dim=1).detach().cpu().numpy()

The matrix netst2sc and netsc2st are ST-SC and SC-ST mapping matrices, respectively. In the ST-SC mapping matrix, the probability of one spot to other cells summarizes to 1. In the SC-ST mapping matrix, the probability of one cell to other spots summarizes to 1.

Then we can get the spatial coordinate for every single cell.

adata.obsm['spatialDA'] = all_coord(pd.DataFrame(netsc2st,index=adata.obs_names,columns=stadata.obs_names),spcoor)

Compared with other methods, STEM is the only method that preserves the original topology structure of all single cells. loss

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scSTEM-0.0.1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

scSTEM-0.0.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file scSTEM-0.0.1.tar.gz.

File metadata

  • Download URL: scSTEM-0.0.1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for scSTEM-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ad40d5a05c595a89c62cce398692efca8c2a06947ff40c32d05ade6d53f7f861
MD5 cc11c0c5c9dd0bd3fdfe7f3633255e47
BLAKE2b-256 ada6851240c337b9a03b9a8d6bf394ece681d5179df3bdcd6a1e6a350cfb4e29

See more details on using hashes here.

File details

Details for the file scSTEM-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: scSTEM-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for scSTEM-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 149b0d948ad869651d7b0c15550d6692d23311075d897bbf6e048209ed78a28b
MD5 56d7a29f21910e28ed552681e8e4bc50
BLAKE2b-256 14e5a0becd01f748c1199ec8276944ec1c3cbaf0c2b177275055f315fb14862a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page