Skip to main content

A package for computation of continuous spatial structure scores in spatial omics data.

Project description

PersiST

PersiST is an exploratory method for analysing spatial transcriptomics (and other spatial 'omics) datsets. Given a spatial transcriptomics data set containing expression data on multiple genes, resolved to a shared set of co-ordinates (for example, Visium type spatial transcriptomics data), PersiST computes a single score for each gene that measures the amount of spatial structure that gene shows in it's expression pattern, called the Coefficient of Spatial Structure (CoSS). This score can be used for multiple analytical tasks, as we show below.

Installation

PersiST can be installed using pip:

python3 -m pip install persist_spatial

Spatially Variable Gene Identification

For this tutorial, we shall be looking at a Visium type spatial transcriptomics data on a sample from the Kidney Precision Medicine Project[1].

import pandas as pd
df = pd.read_csv('data/kpmp_30-10125_spatial_expression.csv')
df.head()
x_position y_position TSPAN6 TNMD DPM1 SCYL3 C1orf112 FGR CFH FUCA2 ... ENSG00000288156 ENSG00000288162 ENSG00000288172 ENSG00000288187 ENSG00000288234 ENSG00000288253 ENSG00000288302 ENSG00000288380 ENSG00000288398 SOD2
0 0.548810 0.834208 0.00000 0.0 0.000000 0.0 0.00000 117.633220 0.00000 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1058.6990
1 0.589610 0.809106 0.00000 0.0 0.000000 0.0 0.00000 86.865880 173.73177 86.86588 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1737.3176
2 0.571644 0.166174 75.90709 0.0 75.907090 0.0 0.00000 0.000000 151.81418 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2201.3057
3 0.539074 0.714422 382.89725 0.0 127.632416 0.0 0.00000 127.632416 0.00000 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1148.6918
4 0.570493 0.468741 82.88438 0.0 0.000000 0.0 82.88438 0.000000 82.88438 0.00000 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1989.2250

5 rows × 26026 columns

This is a pandas DataFrame where the first two columns correspond to the well co-ordinates, and the remaining columns contain the expression of each gene in each well (in this case measured in counts per million). This is the format PersiST expects spatial transcriptomics data to come in.

We can compute CoSS values for all the genes in this sample using the function run_persistence(), which takes as input a data frame like the above. This should take about 10 - 20 minutes, depending on the system you are running this on.

from persist_spatial.compute_persistence import run_persistence
metrics = run_persistence(df)

The CoSS is a measure of the amount of spatial structure in a gene's expression pattern. Let's take a look at those genes with the highest CoSS scores:

metrics = metrics.sort_values('CoSS', ascending=False)
metrics.iloc[:10,:]
gene CoSS ratio gene_rank possible_artefact svg
16443 IGLC1 0.141620 0.651803 1.0 No Yes
16483 IGHG1 0.114255 0.467722 2.0 No Yes
5372 MT1G 0.105850 0.335738 3.0 No Yes
10798 DEFB1 0.103534 0.376595 4.0 No Yes
12467 CCL19 0.101025 0.649770 5.0 No Yes
22516 C17orf113 0.098336 0.574433 6.0 No Yes
6980 ALDOB 0.096201 0.271491 7.0 No Yes
5750 PODXL 0.095475 0.327815 8.0 No Yes
1102 SLC12A3 0.095306 0.352575 9.0 No Yes
11812 UMOD 0.094709 0.401716 10.0 No Yes

PersiST outputs a number of quantities for each gene:

  • CoSS: The Coefficient of Spatial Structure, a continuous quantity that can serve as a proxy for the amount of spatial structure in a gene's expression.
  • Ratio: Roughly, this measures how much of a gene's CoSS is down to a single spatial features. Genes with a high ratio may be techinical artefacts, see [2] for details.
  • gene_rank: The rank of each gene, where gene's are ranked from highest to lowest CoSS (so a rank of 1 is give to the gene with the highest CoSS).
  • possible_artefact: Based on the ratio, PersiST automatically flags genes as possible artefacts [2]. We emphasise that this is only a suggestion, manual inspection should be performed before dismissing any genes.
  • svg: Based on the CoSS scores, PersiST automatically calls genes as spatially variable or not [2].

We can plot the expression of those genes for which the CoSS is highest using the function plot_many_genes(), to which we need to provide a dataframe containing spatial expression data, and a list of genes to plot.

from persist_spatial.plotting_utils import plot_many_genes
plot_many_genes(df, list(metrics.gene)[:20])

png

We can see that PersiST effectively surfaces those genes with notable spatial structure.

From the CoSS scores PersiST automatically calles genes as spatially variable or not (this is the 'svg' column in the results). This provides a triaged list of genes that can be selected for further analysis.

For example, one can search for genes with spatially similar expression patterns. Reduction to the comparatively small number of genes PersiST typically calls as SV makes this task much easier; in our experience simple clustering methods, such as hierarchical clustering, were effective to pick out groups of SVGs with co-localised expression.

For example, we plot group of genes all expressed in the glomeruli of this particular sample [2]. This group was obtained by running simple hierarchichal clustering on the list of SVGs identified by PersiST and manually inspecting the results.

plot_many_genes(df, ['PODXL', 'PTGDS', 'IGFBP5', 'TGFBR2', 'IFI27', 'HTRA1'], numcols=3)

png

Differential Spatial Expression Testing

If you are working with multiple spatial transcriptomics samples, and there are defined subgroups present within these samples, the CoSS scores can be used to look for genes that display difference in their spatial pattern of expression between the subgroups.

In the KPMP dataset, there are Acute Kidney Infection (AKI) and Chronic Kidney Disease (CKD) samples. For each gene, we computed the average CoSS score within the AKI and CKD samples. The gene with the highest different ebtween the two was UMOD. Below we plot the expression of UMOD in all the kpmp samples.

png

In the AKI samples, UMOD displays well-defined regions of higher expression, whereas in the CKD samples the expression of UMOD is much more diffuse.

UMOD is a marker gene for tubles, a key structural component of the kidney. It is plausible that this difference in expression between the AKI and CKD samples reflects the structural breakdown that is chracteristic of progressed kidney disease. Using PersiST, we are able to automatically detect and quantify this structural breakdown.

References

[1] Blue B Lake et al. “An atlas of healthy and injured cell states and niches in the human kidney”. In: Nature 619.7970 (2023), pp. 585–594.

[2] PersiST paper (not yet published)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persist_spatial-0.1.0.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persist_spatial-0.1.0-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file persist_spatial-0.1.0.tar.gz.

File metadata

  • Download URL: persist_spatial-0.1.0.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for persist_spatial-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8f4928523514ac8bd013efcb9986cd7ee87e73f641a6306ec9e61bd95d32622e
MD5 46f2648d7b8711f6514c88139e994deb
BLAKE2b-256 c3592e2623a94f161ae7ef6893deaa0b51009bdf6ce685ae482fbc3dc52a3836

See more details on using hashes here.

File details

Details for the file persist_spatial-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for persist_spatial-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 be59117c436f6fe87e082c167d4d50c174df3c942be5e19d89a69984d3905840
MD5 f785b30119efcbaf5028d649516e90de
BLAKE2b-256 ea129e2b784387a82743be5e76834576ecc3e7b03a0b04dd8579bb9c2c9abbb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page