Python package for spatial transcriptomics data analysis
Project description
📖 Documents | 🚀 Tutorial | 💬 Contact me
👩🏫 Introduction
Why STMiner?
ST data presents challenges such as uneven cell density distribution, low sampling rates, and complex spatial structures. Traditional spot-based analysis strategies struggle to effectively address these issues. STMiner explores ST data by leveraging the spatial distribution of genes, thus avoiding the biases that these conditions can introduce into the results.
Most importantly, STMiner offers seamless integration with Anndata/Scanpy and can be easily installed via PyPI.
Method detail
Here we propose “STMiner”. The three key steps of analyzing ST data in STMiner are depicted.
(Left top) STMiner first utilizes Gaussian Mixture Models (GMMs) to represent the spatial distribution of each gene and the overall spatial distribution. (Left bottom) STMiner then identifies spatially variable genes by calculating the cost that transfers the overall spatial distribution to gene spatial distribution. Genes with high costs exhibit significant spatial variation, meaning their expression patterns differ considerably across different regions of the tissue. The distance array is built between SVGs in the same way, genes with similar spatial structures have a low cost to transport to each other, and vice versa. (Right) The distance array is embedded into a low-dimensional space by Multidimensional Scaling, allowing for clustering genes with similar spatial expression patterns into distinct functional gene sets and getting their spatial structure.
🚀 Quick start by example
Please visit STMiner Documents for installation and detail usage.
import package
from STMiner import SPFinder
Load data
You can download them from STMiner-test-data. You can also download the raw dataset from GEO, STMiner can read spatial transcriptome data in various formats, such as gem, bmk, and h5ad (see STMiner Documents).
We recommend using the h5ad format, as it is currently the most widely used and supported by most algorithms and software in the spatial transcriptomics field.
sp = SPFinder()
file_path = 'Path/to/your/h5ad/file'
sp.read_h5ad(file=file_path, bin_size=1)
Find spatial high variable genes
sp.get_genes_csr_array(min_cells=200, log1p=False, vmax=100)
sp.spatial_high_variable_genes()
- The parameter min_cells was used to filter genes that are too sparse to generate a reliable spatial distribution.
- The parameter log1p was used to avoid extreme values affecting the results. For most open-source h5ad files, log1p has already been executed, so the default value here is False.
- You can perform STMiner in your interested gene sets. Use parameter gene_list to input the gene list to STMiner. Then, STMiner will only calculate the given gene set of the dataset.
You can check the distance of each gene by:
sp.global_distance
| Gene | Distance | z-score |
|---|---|---|
| myha | 1.35E+08 | 2.771493 |
| vmhcl | 1.01E+08 | 2.470881 |
| zgc:101560 | 9.95E+07 | 2.458787 |
| pvalb1 | 9.82E+07 | 2.445257 |
| myhz2 | 9.75E+07 | 2.437787 |
| ... | ... | ... |
| rps17 | 2.61E+05 | -3.63207 |
| rpl13 | 2.48E+05 | -3.68506 |
| rpl32 | 2.43E+05 | -3.70327 |
| rsl24d1 | 2.27E+05 | -3.7757 |
| rpl22 | 1.83E+05 | -3.99332 |
The 'Gene' column is the gene name, and the 'Distance' column is the difference between the spatial distribution of the gene and the background.
A larger difference indicates a more pronounced spatial pattern of the gene.
Preprocess and Fit GMM
sp.fit_pattern(n_comp=20, gene_list=list(sp.global_distance[:2000]['Gene']))
n_comp=20 means each GMM model has 20 components.
Build distance matrix & clustering
# This step calculates the distance between genes' spatial distributions.
sp.build_distance_array()
# Dimensionality reduction and clustering.
sp.cluster_gene(n_clusters=6, mds_components=20)
Result & Visualization
The result is stored in genes_labels:
sp.genes_labels
The output looks like the following:
| gene_id | labels | |
|---|---|---|
| 0 | Cldn5 | 2 |
| 1 | Fyco1 | 2 |
| 2 | Pmepa1 | 2 |
| 3 | Arhgap5 | 0 |
| 4 | Apc | 5 |
| .. | ... | ... |
| 95 | Cyp2a5 | 0 |
| 96 | X5730403I07Rik | 0 |
| 97 | Ltbp2 | 2 |
| 98 | Rbp4 | 4 |
| 99 | Hist1h1e | 4 |
Visualize the distance array:
import seaborn as sns
sns.clustermap(sp.genes_distance_array)
Finding gene sets with interested structure
Get patterns of interested gene/gene set:
interested_genes = ["mbpa", "BX957331.1", "madd"]
sp.get_pattern_of_given_genes(gene_list = interested_genes)
Compare the distance between all genes and the given gene set
from STMiner.Algorithm.distance import compare_gmm_distance
df = compare_gmm_distance(sp.custom_pattern, sp.patterns)
df.to_csv('compare_distance.csv')
df
| Gene | distance |
|---|---|
| mbpa | 0.8914643122002152 |
| map1ab | 0.9479574709875033 |
| snap25a | 0.9801858512442632 |
| nsfa | 0.9948239449738531 |
| stxbp1a | 0.99916307128497 |
| ... | ... |
| lrrfip1b | 1.9981586323013931 |
| si:ch211-145h19.2 | 1.9995115533927301 |
| BX248122.1 | 1.9996375745511945 |
| si:dkey-7i4.24 | 1.9997052371268462 |
A lower distance indicates that the spatial expression pattern of the gene is more similar to that of the gene set of interest.
To visualize the patterns:
Note: A image path for image_path is needed if you want to show background image. In this example, you can download the processed image here. Anyway, image_path is optional, not providing background images has no impact on the calculation results.
sp.get_pattern_array(vote_rate=0.3)
img_path = 'path/to/downloaded/image'
sp.plot.plot_pattern(vmax=99,
heatmap=False,
s=10,
reverse_y=True, # optional
reverse_x=True, # optional
image_path=img_path, # optional
rotate_img=True, # optional
k=4, # optional
aspect=0.55 # optional
)
Visualize the intersections between patterns 3 & 1:
sp.plot.plot_intersection(pattern_list=[0, 1],
image_path=img_path,
reverse_y=True,
reverse_x=True,
aspect=0.55,
s=20)
To visualize the gene expression by labels:
sp.plot.plot_genes(label=0, vmax=99)
Attributes of STMiner.SPFinder Object
| Attributes | Type | Description |
|---|---|---|
| adata | Anndata | Anndata for loaded spatial data |
| patterns | dict | Spatial distributions pattern of genes |
| genes_patterns | dict | GMM model for each gene |
| global_distance | pd. DataFrame | Distances between genes and background |
| mds_features | array | embedding features of genes |
| genes_distance_array | pd. DataFrame | Distance between each GMM |
| genes_labels | pd. DataFrame | Gene name and their pattern labels |
| plot | Object | Call plot to visualization |
📜 Release history
| Version | Date | Description |
|---|---|---|
| 0.0.7 | 2025/2/21 | improved performance of get_pattern_array() |
pypi: https://pypi.org/project/STMiner/#history
🔖 Referance
[1] Sun, P., Bush, S. J., Wang, S., Jia, P., Li, M., Xu, T., Zhang, P., Yang, X., Wang, C., Xu, L., Wang, T., & Ye, K. (2025). STMiner: Gene-centric spatial transcriptomics for deciphering tumor tissues. Cell Genomics, 5(2). https://doi.org/10.1016/j.xgen.2025.100771
✉️ Contact
If you encounter any issues during use, please try updating STMiner to the latest version. If the issue persists, feel free to submit your problem on the issue page or contact us through the following methods:
- Peisen Sun: 📧(sunpeisen@stu.xjtu.edu.cn) / 𝕏(https://x.com/Sun_python)
- Kai Ye: 📧(kaiye@xjtu.edu.cn)
Please ⭐Star STMiner on Github if you find it's useful, thank you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file STMiner-0.0.8.tar.gz.
File metadata
- Download URL: STMiner-0.0.8.tar.gz
- Upload date:
- Size: 46.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5f3580edbc4f08ee0fbe19dd4f5949fc9c548c0cfed9141bea649d82f160a6f
|
|
| MD5 |
0a9a6cfcd508073ae0c45e41d8605711
|
|
| BLAKE2b-256 |
151618955c3413c29aa4f56f86e8ca3e3ee5807d707e5abc414ca96cd4445a4f
|
File details
Details for the file STMiner-0.0.8-py3-none-any.whl.
File metadata
- Download URL: STMiner-0.0.8-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cce4dd28f055d11580e71fe9b5e09e484f53b7149adcf3fa0da3d79ebe5bf30f
|
|
| MD5 |
fbb1c0524a175b2eb2bede9509b74099
|
|
| BLAKE2b-256 |
42ce70f79d8a5f9ac10362373cbaa07fe1c549c008edc3f2b9c257f9b3a86e58
|