Skip to main content

Python package for spatial transcriptomics data analysis

Project description

Static Badge PyPI - Version GitHub repo size PyPI - Downloads Static Badge Static Badge Static Badge Static Badge DOI Documentation Status GitHub Issues or Pull Requests GitHub Repo stars

📖 Documents | 🚀 Tutorial | 💬 Contact me

👩‍🏫 Introduction

Why STMiner?

ST data presents challenges such as uneven cell density distribution, low sampling rates, and complex spatial structures. Traditional spot-based analysis strategies struggle to effectively address these issues. STMiner explores ST data by leveraging the spatial distribution of genes, thus avoiding the biases that these conditions can introduce into the results.

Most importantly, STMiner offers seamless integration with Anndata/Scanpy and can be easily installed via PyPI.

Method detail

Here we propose “STMiner”. The three key steps of analyzing ST data in STMiner are depicted.

(Left top) STMiner first utilizes Gaussian Mixture Models (GMMs) to represent the spatial distribution of each gene and the overall spatial distribution. (Left bottom) STMiner then identifies spatially variable genes by calculating the cost that transfers the overall spatial distribution to gene spatial distribution. Genes with high costs exhibit significant spatial variation, meaning their expression patterns differ considerably across different regions of the tissue. The distance array is built between SVGs in the same way, genes with similar spatial structures have a low cost to transport to each other, and vice versa. (Right) The distance array is embedded into a low-dimensional space by Multidimensional Scaling, allowing for clustering genes with similar spatial expression patterns into distinct functional gene sets and getting their spatial structure.

🚀 Quick start by example

Please visit STMiner Documents for installation and detail usage.

import package

from STMiner import SPFinder

Load data

You can download the demo dataset from GEO, or you can also download them from STMOMICS, such as GSM4838133_10xvisium.h5ad. STMiner can read spatial transcriptome data in various formats, such as gem, bmk, and h5ad (see STMiner Documents).
We recommend using the h5ad format, as it is currently the most widely used and supported by most algorithms and software in the spatial transcriptomics field.

sp = SPFinder()
file_path = 'Path/to/your/h5ad/file'
sp.read_h5ad(file=file_path, bin_size=1)

Find spatial high variable genes

sp.get_genes_csr_array(min_cells=500, log1p=False)
sp.spatial_high_variable_genes()
  • The parameter min_cells was used to filter genes that are too sparse to generate a reliable spatial distribution.
  • The parameter log1p was used to avoid extreme values affecting the results. For most open-source h5ad files, log1p has already been executed, so the default value here is False.
  • You can perform STMiner in your interested gene sets. Use parameter gene_list to input the gene list to STMiner. Then, STMiner will only calculate the given gene set of the dataset.

You can check the distance of each gene by:

sp.global_distance
Gene Distance z-score
myha 1.35E+08 2.771493
vmhcl 1.01E+08 2.470881
zgc:101560 9.95E+07 2.458787
pvalb1 9.82E+07 2.445257
myhz2 9.75E+07 2.437787
... ... ...
rps17 2.61E+05 -3.63207
rpl13 2.48E+05 -3.68506
rpl32 2.43E+05 -3.70327
rsl24d1 2.27E+05 -3.7757
rpl22 1.83E+05 -3.99332

The 'Gene' column is the gene name, and the 'Distance' column is the difference between the spatial distribution of the gene and the background.
A larger difference indicates a more pronounced spatial pattern of the gene.

Preprocess and Fit GMM

sp.fit_pattern(n_comp=20, gene_list=list(sp.global_distance[:1000]['Gene']))

n_comp=20 means each GMM model has 20 components.

Build distance matrix & clustering

# This step calculates the distance between genes' spatial distributions.
sp.build_distance_array()
# Dimensionality reduction and clustering.
sp.cluster_gene(n_clusters=6, mds_components=20) 

Result & Visualization

The result is stored in genes_labels:

sp.genes_labels

The output looks like the following:

gene_id labels
0 Cldn5 2
1 Fyco1 2
2 Pmepa1 2
3 Arhgap5 0
4 Apc 5
.. ... ...
95 Cyp2a5 0
96 X5730403I07Rik 0
97 Ltbp2 2
98 Rbp4 4
99 Hist1h1e 4

Visualize the distance array:

import seaborn as sns
sns.clustermap(sp.genes_distance_array)

To visualize the patterns:

Note: A image path for image_path is needed if you want to show background image. In this example, you can download the processed image here. Anyway, image_path is optional, not providing background images has no impact on the calculation results.

sp.get_pattern_array(vote_rate=0.3)
img_path = 'path/to/downloaded/image'
sp.plot.plot_pattern(vmax=99,
                     heatmap=False, 
                     s=5, 
                     reverse_y=True, # optional
                     reverse_x=True, # optional
                     image_path=img_path, # optional
                     rotate_img=True, # optional
                     k=4, # optional
                     aspect=0.55 # optional
                     )

Visualize the intersections between patterns 3 & 1:

sp.plot.plot_intersection(pattern_list=[0, 1],
                          image_path=img_path,
                          reverse_y=True,
                          reverse_x=True,
                          aspect=0.55,
                          s=20)

To visualize the gene expression by labels:

sp.plot.plot_genes(label=0, vmax=99)

Attributes of STMiner.SPFinder Object

Attributes Type Description
adata Anndata Anndata for loaded spatial data
patterns dict Spatial distributions pattern of genes
genes_patterns dict GMM model for each gene
global_distance pd. DataFrame Distances between genes and background
mds_features array embedding features of genes
genes_distance_array pd. DataFrame Distance between each GMM
genes_labels pd. DataFrame Gene name and their pattern labels
plot Object Call plot to visualization

📜 Release history

https://pypi.org/project/STMiner/#history

🔖 Referance

[1] Sun, P., Bush, S. J., Wang, S., Jia, P., Li, M., Xu, T., Zhang, P., Yang, X., Wang, C., Xu, L., Wang, T., & Ye, K. (2025). STMiner: Gene-centric spatial transcriptomics for deciphering tumor tissues. Cell Genomics, 5(2). https://doi.org/10.1016/j.xgen.2025.100771

✉️ Contact


Please ⭐Star STMiner on Github if you find it's useful, thank you!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

STMiner-0.0.7.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

STMiner-0.0.7-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file STMiner-0.0.7.tar.gz.

File metadata

  • Download URL: STMiner-0.0.7.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for STMiner-0.0.7.tar.gz
Algorithm Hash digest
SHA256 556e745c43a0fd2e2db7bc448c8c349bac697f5dce99f9bdb0337146d3d64bf5
MD5 97e64db03f64a6bc50ae18436ebeaf9e
BLAKE2b-256 4bd727b315cba94ab1b36df7c9fa99a2820df9790aad791f1a14524fa9fb9f23

See more details on using hashes here.

File details

Details for the file STMiner-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: STMiner-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for STMiner-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4ad29644ad22f3c3ca16c637766595227aeaf228b7af6a6863136cab3bd1adbb
MD5 d6bc63d9c3f4e06fd4762c317f16d52c
BLAKE2b-256 3d1b27a49d65e0c9abd4a5cb8b493892c95c34dadbb7d36020eb542ea111cc59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page