A computational method to align and integrate spatial transcriptomics experiments.
Project description
PASTE
PASTE is a computational method that leverages both gene expression similarity and spatial distances between spots to align and integrate spatial transcriptomics data. In particular, there are two methods:
pairwise_align
: align spots across pairwise slices.center_align
: integrate multiple slices into one center slice.
You can read our preprint here.
PASTE is actively being worked on with future updates coming.
Recent News
As of version 1.1.0, PASTE now runs on AnnData making it very easy to integrate with Scanpy for better downstream analysis. Hooray!
This also means that the old version that uses the STLayer
object is now deprecated.
Dependencies
To run PASTE, you will need the following Python packages:
- POT: Python Optimal Transport (https://PythonOT.github.io/)
- Scanpy (https://scanpy.readthedocs.io/en/stable/)
- Numpy
- Pandas
- scipy.spatial
- sklearn.preprocessing
Installation
The easiest way is to install PASTE on pypi: https://pypi.org/project/paste-bio/.
pip install paste-bio
Or you can install PASTE on bioconda: https://anaconda.org/bioconda/paste-bio.
conda install -c bioconda paste-bio
Check out Tutorial.ipynb for an example of how to use PASTE.
Alternatively, you can clone the respository and try the following example in a notebook or the command line.
Quick Start
To use PASTE we require at least two slices of spatial-omics data (both expression and coordinates) that are in anndata format (i.e. read in by scanpy/squidpy). We have included a breast cancer dataset from [1] in the sample_data folder of this repo that we will use as an example below to show how to use PASTE.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import scanpy as sc
import paste as pst
# Load Slices
data_dir = './sample_data/' # change this path to the data you wish to analyze
# Assume that the coordinates of slices are named slice_name + "_coor.csv"
def load_slices(data_dir, slice_names=["slice1", "slice2"]):
slices = []
for slice_name in slice_names:
slice_i = sc.read_csv(data_dir + slice_name + ".csv")
slice_i_coor = np.genfromtxt(data_dir + slice_name + "_coor.csv", delimiter = ',')
slice_i.obsm['spatial'] = slice_i_coor
# Preprocess slices
sc.pp.filter_genes(slice_i, min_counts = 15)
sc.pp.filter_cells(slice_i, min_counts = 100)
slices.append(slice_i)
return slices
slices = load_slices(data_dir)
slice1, slice2 = slices
# Pairwise align the slices
pi12 = pst.pairwise_align(slice1, slice2)
# To visualize the alignment you can stack the slices
# according to the alignment pi
slices, pis = [slice1, slice2], [pi12]
new_slices = pst.stack_slices_pairwise(slices, pis)
slice_colors = ['#e41a1c','#377eb8']
plt.figure(figsize=(7,7))
for i in range(len(new_slices)):
pst.plot_slice(new_slices[i],slice_colors[i],s=400)
plt.legend(handles=[mpatches.Patch(color=slice_colors[0], label='1'),mpatches.Patch(color=slice_colors[1], label='2')])
plt.gca().invert_yaxis()
plt.axis('off')
plt.show()
# Center align slices
## We have to reload the slices as pairwise_alignment modifies the slices.
slices = load_slices(data_dir)
slice1, slice2 = slices
# Construct a center slice
## choose one of the slices as the coordinate reference for the center slice,
## i.e. the center slice will have the same number of spots as this slice and
## the same coordinates.
initial_slice = slice1.copy()
slices = [slice1, slice2]
lmbda = len(slices)*[1/len(slices)] # set hyperparameter to be uniform
## Possible to pass in an initial pi (as keyword argument pis_init)
## to improve performance, see Tutorial.ipynb notebook for more details.
center_slice, pis = pst.center_align(initial_slice, slices, lmbda)
## The low dimensional representation of our center slice is held
## in the matrices W and H, which can be used for downstream analyses
W = center_slice.uns['paste_W']
H = center_slice.uns['paste_H']
Command Line
We provide the option of running PASTE from the command line.
First, clone the repository:
git clone https://github.com/raphael-group/paste.git
Next, when providing files, you will need to provide two separate files: the gene expression data followed by spatial data (both as .csv) for the code to initialize one slice object.
Sample execution (based on this repo): python paste-cmd-line.py -m center -f ./sample_data/slice1.csv ./sample_data/slice1_coor.csv ./sample_data/slice2.csv ./sample_data/slice2_coor.csv ./sample_data/slice3.csv ./sample_data/slice3_coor.csv
Note: pairwise
will return pairwise alignment between each consecutive pair of slices (e.g. [slice1,slice2], [slice2,slice3]).
Flag | Name | Description | Default Value |
---|---|---|---|
-m | mode | Select either pairwise or center |
(str) pairwise |
-f | files | Path to data files (.csv) | None |
-d | direc | Directory to store output files | Current Directory |
-a | alpha | Alpha parameter for PASTE | (float) 0.1 |
-c | cost | Expression dissimilarity cost (kl or Euclidean ) |
(str) kl |
-p | n_components | n_components for NMF step in center_align |
(int) 15 |
-l | lmbda | Lambda parameter in center_align |
(floats) probability vector of length n |
-i | intial_slice | Specify which file is also the intial slice in center_align |
(int) 1 |
-t | threshold | Convergence threshold for center_align |
(float) 0.001 |
-x | coordinates | Output new coordinates (toggle to turn on) | False |
-w | weights | Weights files of spots in each slice (.csv) | None |
-s | start | Initial alignments for OT. If not given uses uniform (.csv structure similar to alignment output) | None |
pairwise_align
outputs a (.csv) file containing mapping of spots between each consecutive pair of slices. The rows correspond to spots of the first slice, and cols the second.
center_align
outputs two files containing the low dimensional representation (NMF decomposition) of the center slice gene expression, and files containing a mapping of spots between the center slice (rows) to each input slice (cols).
Sample Dataset
Added sample spatial transcriptomics dataset consisting of four breast cancer slice courtesy of:
[1] Ståhl, Patrik & Salmén, Fredrik & Vickovic, Sanja & Lundmark, Anna & Fernandez Navarro, Jose & Magnusson, Jens & Giacomello, Stefania & Asp, Michaela & Westholm, Jakub & Huss, Mikael & Mollbrink, Annelie & Linnarsson, Sten & Codeluppi, Simone & Borg, Åke & Pontén, Fredrik & Costea, Paul & Sahlén, Pelin Akan & Mulder, Jan & Bergmann, Olaf & Frisén, Jonas. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353. 78-82. 10.1126/science.aaf2403.
Note: Original data is (.tsv), but we converted it to (.csv).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for paste_bio-1.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c000cebd7e73fd107f883e3b02dc266f5cac0542b5f67e61262442fd62923c5 |
|
MD5 | 03bc2159588cfd6b693c62407887446f |
|
BLAKE2b-256 | 99ea5dd77bc81fa19d9f9ee091296b7a6a8ccc83de8b4bf86a779be710eaecb7 |