Skip to main content

Spatial query tools for analyzing spatial transcriptomics data

Project description

Spatial-Query

A Python package for fast spatial query and analysis of Spatial Transcriptomics (ST) data. Spatial-Query provides efficient methods to identify frequent patterns, perform motif enrichment analysis, and conduct differential expression analysis in spatial transcriptomics datasets.

Features

  • Single FOV Analysis: Analyze spatial patterns within individual fields of view
  • Multi-FOV Analysis: Compare patterns across multiple fields of view or datasets
  • Fast Spatial Queries: Built on k-D tree for efficient spatial neighborhood queries
  • Pattern Mining: Identify frequent cell type patterns using FP-Growth algorithm
  • Motif Enrichment: Statistical analysis of spatial motif enrichment
  • Differential Expression: Gene expression analysis with Fisher's exact test
  • Visualization: Comprehensive plotting functions for spatial data

Installation

From GitHub Repository

# Clone the repository
git clone https://github.com/ShaokunAn/Spatial-Query.git
cd Spatial-Query

# Install in development mode
pip install .

# Or install directly from GitHub
pip install git+https://github.com/ShaokunAn/Spatial-Query.git@main

Dependencies

The package requires the following dependencies:

  • Python >= 3.8
  • numpy, pandas, scipy
  • matplotlib, seaborn
  • scikit-learn
  • scanpy, anndata
  • mlxtend
  • statsmodels
  • pybind11 (for C++ extensions)

Quick Start

Single FOV Analysis

import scanpy as sc
from SpatialQuery import spatial_query

# Load your spatial transcriptomics data
adata = sc.read_h5ad("your_data.h5ad")

# Initialize spatial query object
sq = spatial_query(
    adata=adata,
    dataset="ST_sample",
    spatial_key="X_spatial",  # spatial coordinates in adata.obsm
    label_key="predicted_label",  # cell type labels in adata.obs
    build_gene_index=False,  # build gene expression index. If set True, build scfind index otherwise use adata.X directly for DE gene analysis
    feature_name="gene_ids",  # gene names in adata.var
    if_lognorm=True  # perfrom log-normalization of adata.X if True when initializing spatial_query object. 

)

# Find frequent patterns around a specific cell type
fp_results = sq.find_fp_knn(
    ct="T_cell",  # anchors cells for neighborhood analysis
    k=30,  # number of neighbors
    min_support=0.5  # minimum frequency support threshold
)

# Perform motif enrichment analysis
enrichment_results = sq.motif_enrichment_knn(
    ct="T_cell",  # center cell type as anchors
    motifs=["T_cell", "B_cell"],  # motif to test. If None, frequent patterns will be searched first for enrichment analysis
    k=30,  # number of neighbors
    min_support=0.5,  # minimum frequency support threshold
    max_dist=200,  # maximum distance for neighbors
    return_cellID=False  # whether to return cell IDs for each motif and center cells
)

# Differential expression analysis
de_results = sq.de_genes(
    ind_group1=[0, 1, 2, 3],  # indices of group 1 cells
    ind_group2=[4, 5, 6, 7],  # indices of group 2 cells
    method="fisher"  # Fisher's exact test
)

# Visualize results
sq.plot_fov(fig_size=(10, 8))  # Plot spatial data with cell types
sq.plot_motif_grid(motif=["T_cell", "B_cell"], max_dist=50)  # Plot motif around grid points
sq.plot_motif_celltype(
    ct="T_cell",  # center cell type
    motif=["T_cell", "B_cell"],  # motif to visualize
    max_dist=100,  # radius for neighborhood
    fig_size=(10, 5),
    save_path=None  # path to save figure, None for display only
)

Multi-FOV Analysis

from SpatialQuery import spatial_query_multi

# Prepare multiple datasets
adatas = [adata1, adata2, adata3]  # List of AnnData objects
datasets = ["healthy", "healthy", "disease"]  # Dataset names. 

# Initialize multi-FOV spatial query
sq_multi = spatial_query_multi(
    adatas=adatas,
    datasets=datasets,
    spatial_key="X_spatial",
    label_key="predicted_label",
    build_gene_index=True,
    feature_name="gene_ids"
)

# Find frequent patterns across datasets
fp_multi = sq_multi.find_fp_knn(
    ct="T_cell",
    dataset=["healthy"],  # specific datasets
    k=30,
    min_support=0.5
)

# Motif enrichment analysis across datasets
motif_results = sq_multi.motif_enrichment_knn(
    ct="T_cell",  # center cell type
    motifs=["T_cell", "B_cell"],  # motifs to test
    dataset=["healthy", "disease"],  # datasets to compare
    k=30,
    min_support=0.5,
    max_dist=200
)

# Differential pattern analysis across datasets
diff_results = sq_multi.differential_analysis_knn(
    ct="T_cell",  # center cell type
    datasets=["healthy", "disease"],  # exactly 2 datasets for comparison
    k=30,  # number of neighbors
    min_support=0.5,  # minimum support threshold
    max_dist=200  # maximum distance for neighbors
)

# Differential gene expression analysis across specified groups using per-dataset indices
from collections import defaultdict

# Example: keys are modified dataset names (e.g., "healthy_0", "healthy_1"), values are index lists for that dataset
ind_group1 = defaultdict(list)
ind_group1["healthy_0"] = [0, 1, 2]
ind_group1["healthy_1"] = [0, 1]

ind_group2 = defaultdict(list)
ind_group2["disease_0"] = [3, 4]


de_multi = sq_multi.de_genes(
    ind_group1=ind_group1,  # group 1: dict keys as dataset names, values as indices in each dataset
    ind_group2=ind_group2,  # group 2: same structure
    genes=["Gene_1", "Gene_2"],      # Genes of interest; uses all genes if no genes are input
    method="fisher"         # method to perform differential gene analysis
)

# Cell type distribution analysis across datasets
dist_results = sq_multi.cell_type_distribution()  # overall distribution
dist_fov = sq_multi.cell_type_distribution_fov()  # per-FOV distribution

# Visualize results for each FOV
for i, sq in enumerate(sq_multi.spatial_queries):
    sq.plot_fov(fig_size=(8, 6))
    sq.plot_motif_celltype(
        ct="T_cell",
        motif=["T_cell", "B_cell"],
        max_dist=50
    )

Core Classes and Methods

spatial_query Class (Single FOV)

The main class for analyzing spatial patterns within a single field of view.

Key Methods:

  • find_fp_knn(ct, k, min_support): Find frequent patterns around a cell type using k-nearest neighbors
  • find_fp_dist(ct, max_dist, min_support): Find frequent patterns using distance-based neighborhoods
  • motif_enrichment_knn(ct, motifs, k, min_support, max_dist): Test motif enrichment using k-NN neighborhoods
  • motif_enrichment_dist(ct, motifs, max_dist, min_support): Test motif enrichment using distance-based neighborhoods
  • find_patterns_grid(max_dist, min_support): Find patterns using grid-based sampling
  • find_patterns_rand(max_dist, n_points, min_support): Find patterns using random sampling
  • de_genes(ind_group1, ind_group2, method): Differential expression analysis
  • plot_fov(fig_size): Visualize the spatial data
  • plot_motif_grid(motif, max_dist): Plot motif distribution around grid points
  • plot_motif_rand(motif, max_dist, n_points): Plot motif distribution around random sampled points
  • plot_motif_celltype(motif, ct, max_dist): Plot motif around specific cell types

Parameters:

  • adata: AnnData object containing spatial transcriptomics data
  • dataset: Dataset name (default: 'ST')
  • spatial_key: Key for spatial coordinates in adata.obsm (default: 'X_spatial')
  • label_key: Key for cell type labels in adata.obs (default: 'predicted_label')
  • build_gene_index: Whether to build gene expression index with scfind (default: False)
  • feature_name: Gene names key in adata.var (required if build_gene_index=True)

spatial_query_multi Class (Multi-FOV)

The main class for analyzing spatial patterns across multiple fields of view or datasets.

Key Methods:

  • find_fp_knn(ct, dataset, k, min_support): Find frequent patterns across specified datasets
  • find_fp_dist(ct, dataset, max_dist, min_support): Find patterns using distance-based neighborhoods
  • motif_enrichment_knn(ct, motifs, dataset, k, min_support, max_dist): Test motif enrichment across datasets
  • motif_enrichment_dist(ct, motifs, dataset, max_dist, min_support): Distance-based motif enrichment
  • differential_analysis_knn(ct, datasets, k, min_support, max_dist): Compare patterns between dataset groups
  • differential_analysis_dist(ct, datasets, max_dist, min_support): Distance-based differential pattern analysis
  • de_genes(ind_group1, ind_group2, gene, method): Differential expression analysis
  • cell_type_distribution(): Analyze cell type distribution across datasets
  • cell_type_distribution_fov(): Cell type distribution per FOV

Parameters:

  • adatas: List of AnnData objects
  • datasets: List of dataset names
  • spatial_key: Key for spatial coordinates
  • label_key: Key for cell type labels
  • build_gene_index: Whether to build gene expression indices

Data Format Requirements

AnnData Object Structure

Your AnnData object should contain:

  • adata.obsm['X_spatial']: Spatial coordinates (n_cells × 2)
  • adata.obs['predicted_label']: Cell type labels
  • adata.var['gene_ids']: Gene names (if using gene expression analysis)
  • adata.X: Gene expression matrix (if using gene expression analysis)

Example Data Preparation

import scanpy as sc
import pandas as pd
import numpy as np

# Create example spatial transcriptomics data
n_cells = 1000
n_genes = 2000

# Spatial coordinates (2D coordinates for each cell)
spatial_coords = np.random.rand(n_cells, 2) * 100

# Cell type labels (annotated cell types)
cell_types = np.random.choice(['T_cell', 'B_cell', 'Macrophage', 'Neuron'], n_cells)

# Gene expression matrix (cells × genes)
expression_matrix = np.random.negative_binomial(5, 0.3, (n_cells, n_genes))

# Create AnnData object
adata = sc.AnnData(X=expression_matrix)
adata.obsm['X_spatial'] = spatial_coords  # Required: spatial coordinates
adata.obs['predicted_label'] = cell_types  # Required: cell type labels
adata.var['gene_ids'] = [f'Gene_{i}' for i in range(n_genes)]  # Required for gene analysis

# Optional: Add gene names as index
adata.var_names = adata.var['gene_ids']

# Optional: Add metadata
adata.obs['sample_id'] = ['sample_1'] * n_cells
adata.obs['region'] = np.random.choice(['cortex', 'medulla'], n_cells)

Loading Real Data

# Load from common spatial transcriptomics formats
import scanpy as sc

# Load 10X Visium data
adata = sc.read_10x_h5("filtered_feature_bc_matrix.h5")
adata.var_names_unique()

# Load spatial coordinates (from spaceranger output)
spatial_coords = pd.read_csv("spatial/tissue_positions_list.csv", 
                            header=None, index_col=0)
spatial_coords = spatial_coords[[1, 2]].values  # x, y coordinates
adata.obsm['X_spatial'] = spatial_coords

# Load cell type annotations (from external analysis)
cell_types = pd.read_csv("cell_type_annotations.csv")
adata.obs['predicted_label'] = cell_types['cell_type'].values

# Initialize spatial query
sq = spatial_query(adata, build_gene_index=False, feature_name="gene_ids")

Advanced Usage

Custom Spatial Analysis

# Custom neighborhood analysis
sq = spatial_query(adata, build_gene_index=True)

# Find patterns with custom parameters
fp_results = sq.find_fp_knn(
    ct="T_cell",
    k=50,  # larger neighborhood
    min_support=0.3  # lower support threshold
)

# Test specific motifs
motif_results = sq.motif_enrichment_knn(
    ct="T_cell",
    motifs=["T_cell", "B_cell", "Macrophage"],
    k=30,
    min_support=0.5,
    max_dist=200
)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Acknowledgments

This package builds upon several excellent open-source libraries including scanpy, scikit-learn, and mlxtend.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatialquery-0.0.3.tar.gz (121.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

spatialquery-0.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

spatialquery-0.0.3-cp312-cp312-macosx_11_0_arm64.whl (378.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

spatialquery-0.0.3-cp312-cp312-macosx_10_13_x86_64.whl (401.5 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

spatialquery-0.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

spatialquery-0.0.3-cp311-cp311-macosx_11_0_arm64.whl (376.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

spatialquery-0.0.3-cp311-cp311-macosx_10_9_x86_64.whl (396.6 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

spatialquery-0.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

spatialquery-0.0.3-cp310-cp310-macosx_11_0_arm64.whl (375.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

spatialquery-0.0.3-cp310-cp310-macosx_10_9_x86_64.whl (395.7 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

File details

Details for the file spatialquery-0.0.3.tar.gz.

File metadata

  • Download URL: spatialquery-0.0.3.tar.gz
  • Upload date:
  • Size: 121.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spatialquery-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c5b4a91f1af66319c6b338d3159a4c1487d5e5d033d5689bffd373dfd681ad83
MD5 092205b63d4ae94b827f34fb899b37ea
BLAKE2b-256 59e567478bec6a9dbe06c6e10c6d2681c0f92e6d9efcbc4dd3de51dcfbac947a

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3.tar.gz:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fb7d4498420961898bed7f81b3161f543bdfdacb4115954c11f64a64f4b516e8
MD5 eba1aa5f9b18889f2e5ee172bb41c7fc
BLAKE2b-256 a52aeb913ac6236e99769632b57529f36b7f860273d576c64fb30683fdfc9d55

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa30144e3718aed51d389af176a3e5b4c2ef16f17dd1960cbc4e58947771da0d
MD5 a959a3eccf8d9fb710d35e1da6e7c824
BLAKE2b-256 8c8da54ae6aa874de0442115a5c170ea9c17fa7e167a0b614f6cfafb5353494c

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 9dc70a7682a94888cb4c707cb4c23c0121d42a10b70edeeda6bd7b0141c1c650
MD5 eca7a9896c2d12fdb5f0b247e590cc0e
BLAKE2b-256 b440f760936ff435546f39873e1f312825c83a760bb08debfc4e89db464e48a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp312-cp312-macosx_10_13_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a5634a77d708de9f14d11722b33eeeff64465b7b41886720c1c6c523131e8311
MD5 90d1ed574b30a0c387cb3c8ac1ecb285
BLAKE2b-256 fd8c0b1acdbad3a9c85503f8803dbcad75c1f9c77fe0fc9e2304f266cf4b9579

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 89cfab0b03fcb7c5fe6b52da68314bb2b2d970b72c7a3d51f3afecb6f3d8474b
MD5 1c0e158da985c7c4242b61670d2834dd
BLAKE2b-256 d4cb57ca20139552ad35fee4492c10d4637105d8e1708abe61c62de9ce5fcfdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e175b73fbbe308d6f0ed4bb82f3064a4a1397ba606412229b93a5915602e459a
MD5 370fdeb82042b676219aa375fe625ad2
BLAKE2b-256 4f2369d578b8d2e3e52cece7bc56e06045a542b47e97b4f7ef89618db62e3877

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp311-cp311-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 818bc08ccd10f5b164c561ee41c6a799415dd23b9079e4e02647a863775d9b81
MD5 59a262972e7e042da3ed692dc2062621
BLAKE2b-256 a5ee84a0b6b605d28ecf1a85116d5afe2f62fde437cf16376579850bab29eaac

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7035d664f635a4f206fc6b690ca3b84ee8d8cac39ce897198dcf9ee8ffea7dbe
MD5 517d3ee631dbd2ac975709ded94770cc
BLAKE2b-256 981c3d9944ea8f0bbe927ca3b3c000e6aef3478b2d3bb141a9ad04414dfe80f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spatialquery-0.0.3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for spatialquery-0.0.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0defb8f4c29731d0216658a8c468a6e17c1df6c8b246c5a6e8e5c8c9bedc50b6
MD5 eabbc800851c2d81f35d7185d6ca12bc
BLAKE2b-256 5c2ff582cb6114bc33c74a4535c5d3b20ab504462572c8d05cc7d31bdaaddc10

See more details on using hashes here.

Provenance

The following attestation bundles were made for spatialquery-0.0.3-cp310-cp310-macosx_10_9_x86_64.whl:

Publisher: build_wheels.yml on ShaokunAn/Spatial-Query

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page