A package for computation of continuous spatial structure scores in spatial omics data.
Project description
PersiST
PersiST is an exploratory method for analysing spatial transcriptomics (and other spatial 'omics) datsets. Given a spatial transcriptomics data set containing expression data on multiple genes, resolved to a shared set of co-ordinates (for example, Visium type spatial transcriptomics data), PersiST computes a single score for each gene that measures the amount of spatial structure that gene shows in it's expression pattern, called the Coefficient of Spatial Structure (CoSS). This score can be used for multiple analytical tasks, as we show below.
Installation
PersiST can be installed using pip:
python3 -m pip install persist_spatial
Spatially Variable Gene Identification
For this tutorial, we shall be looking at a Visium type spatial transcriptomics data on a sample from the Kidney Precision Medicine Project[1].
import pandas as pd
df = pd.read_csv('data/kpmp_30-10125_spatial_expression.csv')
df.head()
| x_position | y_position | TSPAN6 | TNMD | DPM1 | SCYL3 | C1orf112 | FGR | CFH | FUCA2 | ... | ENSG00000288156 | ENSG00000288162 | ENSG00000288172 | ENSG00000288187 | ENSG00000288234 | ENSG00000288253 | ENSG00000288302 | ENSG00000288380 | ENSG00000288398 | SOD2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.548810 | 0.834208 | 0.00000 | 0.0 | 0.000000 | 0.0 | 0.00000 | 117.633220 | 0.00000 | 0.00000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1058.6990 |
| 1 | 0.589610 | 0.809106 | 0.00000 | 0.0 | 0.000000 | 0.0 | 0.00000 | 86.865880 | 173.73177 | 86.86588 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1737.3176 |
| 2 | 0.571644 | 0.166174 | 75.90709 | 0.0 | 75.907090 | 0.0 | 0.00000 | 0.000000 | 151.81418 | 0.00000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2201.3057 |
| 3 | 0.539074 | 0.714422 | 382.89725 | 0.0 | 127.632416 | 0.0 | 0.00000 | 127.632416 | 0.00000 | 0.00000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1148.6918 |
| 4 | 0.570493 | 0.468741 | 82.88438 | 0.0 | 0.000000 | 0.0 | 82.88438 | 0.000000 | 82.88438 | 0.00000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1989.2250 |
5 rows × 26026 columns
This is a pandas DataFrame where the first two columns correspond to the well co-ordinates, and the remaining columns contain the expression of each gene in each well (in this case measured in counts per million). This is the format PersiST expects spatial transcriptomics data to come in.
We can compute CoSS values for all the genes in this sample using the function run_persistence(), which takes as input a data frame like the above. This should take about 10 - 20 minutes, depending on the system you are running this on.
from persist_spatial.compute_persistence import run_persistence
metrics = run_persistence(df)
The CoSS is a measure of the amount of spatial structure in a gene's expression pattern. Let's take a look at those genes with the highest CoSS scores:
metrics = metrics.sort_values('CoSS', ascending=False)
metrics.iloc[:10,:]
| gene | CoSS | ratio | gene_rank | possible_artefact | svg | |
|---|---|---|---|---|---|---|
| 16443 | IGLC1 | 0.141620 | 0.651803 | 1.0 | No | Yes |
| 16483 | IGHG1 | 0.114255 | 0.467722 | 2.0 | No | Yes |
| 5372 | MT1G | 0.105850 | 0.335738 | 3.0 | No | Yes |
| 10798 | DEFB1 | 0.103534 | 0.376595 | 4.0 | No | Yes |
| 12467 | CCL19 | 0.101025 | 0.649770 | 5.0 | No | Yes |
| 22516 | C17orf113 | 0.098336 | 0.574433 | 6.0 | No | Yes |
| 6980 | ALDOB | 0.096201 | 0.271491 | 7.0 | No | Yes |
| 5750 | PODXL | 0.095475 | 0.327815 | 8.0 | No | Yes |
| 1102 | SLC12A3 | 0.095306 | 0.352575 | 9.0 | No | Yes |
| 11812 | UMOD | 0.094709 | 0.401716 | 10.0 | No | Yes |
PersiST outputs a number of quantities for each gene:
- CoSS: The Coefficient of Spatial Structure, a continuous quantity that can serve as a proxy for the amount of spatial structure in a gene's expression.
- Ratio: Roughly, this measures how much of a gene's CoSS is down to a single spatial features. Genes with a high ratio may be techinical artefacts, see [2] for details.
- gene_rank: The rank of each gene, where gene's are ranked from highest to lowest CoSS (so a rank of 1 is give to the gene with the highest CoSS).
- possible_artefact: Based on the ratio, PersiST automatically flags genes as possible artefacts [2]. We emphasise that this is only a suggestion, manual inspection should be performed before dismissing any genes.
- svg: Based on the CoSS scores, PersiST automatically calls genes as spatially variable or not [2].
We can plot the expression of those genes for which the CoSS is highest using the function plot_many_genes(), to which we need to provide a dataframe containing spatial expression data, and a list of genes to plot.
from persist_spatial.plotting_utils import plot_many_genes
plot_many_genes(df, list(metrics.gene)[:20])
We can see that PersiST effectively surfaces those genes with notable spatial structure.
From the CoSS scores PersiST automatically calles genes as spatially variable or not (this is the 'svg' column in the results). This provides a triaged list of genes that can be selected for further analysis.
For example, one can search for genes with spatially similar expression patterns. Reduction to the comparatively small number of genes PersiST typically calls as SV makes this task much easier; in our experience simple clustering methods, such as hierarchical clustering, were effective to pick out groups of SVGs with co-localised expression.
For example, we plot group of genes all expressed in the glomeruli of this particular sample [2]. This group was obtained by running simple hierarchichal clustering on the list of SVGs identified by PersiST and manually inspecting the results.
plot_many_genes(df, ['PODXL', 'PTGDS', 'IGFBP5', 'TGFBR2', 'IFI27', 'HTRA1'], numcols=3)
Differential Spatial Expression Testing
If you are working with multiple spatial transcriptomics samples, and there are defined subgroups present within these samples, the CoSS scores can be used to look for genes that display difference in their spatial pattern of expression between the subgroups.
In the KPMP dataset, there are Acute Kidney Infection (AKI) and Chronic Kidney Disease (CKD) samples. For each gene, we computed the average CoSS score within the AKI and CKD samples. The gene with the highest different ebtween the two was UMOD. Below we plot the expression of UMOD in all the kpmp samples.
In the AKI samples, UMOD displays well-defined regions of higher expression, whereas in the CKD samples the expression of UMOD is much more diffuse.
UMOD is a marker gene for tubles, a key structural component of the kidney. It is plausible that this difference in expression between the AKI and CKD samples reflects the structural breakdown that is chracteristic of progressed kidney disease. Using PersiST, we are able to automatically detect and quantify this structural breakdown.
References
[1] Blue B Lake et al. “An atlas of healthy and injured cell states and niches in the human kidney”. In: Nature 619.7970 (2023), pp. 585–594.
[2] PersiST paper (not yet published)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persist_spatial-0.1.0.tar.gz.
File metadata
- Download URL: persist_spatial-0.1.0.tar.gz
- Upload date:
- Size: 8.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f4928523514ac8bd013efcb9986cd7ee87e73f641a6306ec9e61bd95d32622e
|
|
| MD5 |
46f2648d7b8711f6514c88139e994deb
|
|
| BLAKE2b-256 |
c3592e2623a94f161ae7ef6893deaa0b51009bdf6ce685ae482fbc3dc52a3836
|
File details
Details for the file persist_spatial-0.1.0-py3-none-any.whl.
File metadata
- Download URL: persist_spatial-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be59117c436f6fe87e082c167d4d50c174df3c942be5e19d89a69984d3905840
|
|
| MD5 |
f785b30119efcbaf5028d649516e90de
|
|
| BLAKE2b-256 |
ea129e2b784387a82743be5e76834576ecc3e7b03a0b04dd8579bb9c2c9abbb1
|