A computational method to align and integrate spatial transcriptomics experiments.
Project description
PASTE
PASTE is a computational method that leverages both gene expression similarity and spatial distances between spots align and integrate spatial transcriptomics data. In particular, there are two methods:
pairwise_align
: align spots across pairwise ST layers.center_align
: integrate multiple ST layers into one center layer.
You can read our preprint here.
PASTE is actively being worked on with future updates coming.
Dependencies
To run PASTE, you will need the following Python packages:
- POT: Python Optimal Transport (https://PythonOT.github.io/)
- Numpy
- Pandas
- scipy.spatial
- sklearn.preprocessing
Installation
The easiest way is to install PASTE on pypi: https://pypi.org/project/paste-bio/.
pip install paste-bio
Check out Tutorial.ipynb for an example of how to use PASTE.
Or you can clone the respository and run from command line (see below).
Command Line
We provide the option of running PASTE from the command line.
First, clone the repository:
git clone https://github.com/raphael-group/paste.git
Sample execution: python paste-cmd-line.py -m pairwise -f file1.csv file2.csv file3.csv
Note: pairwise
will return pairwise alignment between each consecutive pair of files (e.g. [file1,file2], [file2,file3]).
Flag | Name | Description | Default Value |
---|---|---|---|
-m | mode | Select either pairwise or center |
(str) pairwise |
-f | files | Path to data files (.csv) | None |
-d | direc | Directory to store output files | Current Directory |
-a | alpha | alpha parameter for PASTE | (float) 0.1 |
-p | n_components | n_components for NMF step in center_align |
(int) 15 |
-l | lmbda | lambda parameter in center_align |
(floats) probability vector of length n |
-i | intial_layer | Specify which file is also the intial layer in center_align |
(int) 1 |
-t | threshold | Convergence threshold for center_align |
(float) 0.001 |
Input files are .csv files of the form:
'gene_a' 'gene_b'
'2x5' 0 9
'2x7' 2 6
Where the columns indexes are gene names (str), row indexes are spatial coordinates (str), and entries are gene counts (int). In particular, row indexes are of the form AxB
where A
and B
are floats.
pairwise_align
outputs a (.csv) file containing mapping of spots between each consecutive pair of layers. The rows correspond to spots of the first layer, and cols the second.
center_align
outputs two files containing the low dimensional representation (NMF decomposition) of the center layer gene expression, and files containing a mapping of spots between the center layer (rows) to each input layer (cols).
Sample Dataset
Added sample spatial transcriptomics dataset consisting of four breast cancer layers courtesy of:
Ståhl, Patrik & Salmén, Fredrik & Vickovic, Sanja & Lundmark, Anna & Fernandez Navarro, Jose & Magnusson, Jens & Giacomello, Stefania & Asp, Michaela & Westholm, Jakub & Huss, Mikael & Mollbrink, Annelie & Linnarsson, Sten & Codeluppi, Simone & Borg, Åke & Pontén, Fredrik & Costea, Paul & Sahlén, Pelin Akan & Mulder, Jan & Bergmann, Olaf & Frisén, Jonas. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353. 78-82. 10.1126/science.aaf2403.
Note: Original data is (.tsv), but we converted it to (.csv).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for paste_bio-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2457d195a9d2380266db2e5855e370064b6f8966c404f4a93701aa4b8a7f1de7 |
|
MD5 | a9a02b52343686ee3bd5274a6bb837b7 |
|
BLAKE2b-256 | 1877ec06ba37ec56550169270a139b168373b8bc4ea86109c8a5a4f4470ce2de |