Skip to main content

A cluster-based cell-type deconvolution of spatial transcriptomic data (DECLUST)

Project description

DECLUST is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.


🌟 Features

Spatially-aware clustering: Combines gene expression and spatial coordinates.

Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.

Easy to install: Available via pip.

Visualization: Includes modules for visualizing clustering and marker gene expression.

⏬ Installation

We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.

  • Create a conda environment and install the DECLUST package
   conda create -n declust_env python=3.9
   conda activate declust_env

   pip install declust
  • Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the install_dependencies.sh script:
   sh install_dependencies.sh

The DECLUST package has been installed successfully on Operating systems:

  • macOS Sequoia 15.3.2
  • SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)

📊 Data Input

DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.

Each .h5ad file includes:

sc_adata.h5ad (Single-cell RNA-seq data)

  • .X: Gene expression matrix (cells × genes)
  • .obs: Cell type annotation of single cells

st_adata.h5ad (Spatial transcriptomics data)

  • .X: Spatial gene expression matrix (spots × genes)
  • .obs: Spots coordinates

💡 Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.

🔗 Example Data Download

⚙️ Usage

DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.

Run the pipeline using the following command:

python declust.py --module <module_name> [other options]
  • Available Modules
Module Description
marker Construction of Reference Matrix from Annotated Single-Cell Transcriptomic Data
cluster Identification of spatial clusters of spots from ST data
pseudo_bulk Generate pseudo-bulk ST profiles per cluster
deconv Run deconvolution by Ordinary Least Squares
visualize Visualize markers or deconvolution results

Type python declust.py --help in the terminal to see a list of available commands.

🧬 DECLUST pipeline

  1. Download DECLUST:
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
  1. Unpack data:
   cd DECLUST-0.1.1
   unzip data.zip
  1. Marker gene selection:
   python declust.py --module marker \
   --celltype_col \
   --sample_col

Outputs:

  • sc_data_overlapped.csv and sc_label.csv in the data/ folder

  • marker_genes.csv in the results/ folder

  1. Clustering:
   python declust.py --module cluster

Performs Hierarchical Clustering → DBSCAN → Seeded Region Growing (SRG). Saves:

  • srg_df.csv and clustering plots in results/
  1. Deconvolution:
   python declust.py --module deconv

Performs OLS-based deconvolution and outputs:

  • DECLUST_result.csv in results/

You can run each step individually or execute the entire pipeline by running the deconvolution script.

To export pseudo-bulk profiles for external methods:

   python declust.py --module pseduo_bulk
  • Generates pseudo_bulk.csv in the results/ folder.

💡 Custom Marker Genes

Users can provide their own marker gene list in one of two formats:

  • CSV file containing two columns:
    • Gene: gene names
    • maxgroup: corresponding cell type annotations
   --custom_marker_genes file_path
  • Comma-separated gene list, along with a corresponding comma-separated list of cell types:
   --custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
   --custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."

⚠️ The provided marker genes and cell type annotations must exist in the single-cell dataset.

📬 Quick example to run DECLUST on a simulated data

# 1. Download DECLUST
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
   cd DECLUST-0.1.1

# 2. Configuring environment and install dependencies
   conda create -n declust_env python=3.9
   conda activate declust_env
   pip install declust
   sh install_dependencies.sh

# 3. Download and unpack simulated data
   wget "https://drive.usercontent.google.com/download?id=1VY_vIuZalCBe2IhNCNBSQwo5m5Da8aFw&export=download&authuser=0&confirm=t&uuid=93730baf-2a12-49d7-b475-ab715a3644c3&at=APcmpow759exSs6opQk4zSMVbjXf%3A1744370330609" -O simulation_data.zip
   unzip simulation_data.zip

# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
   python declust.py --module deconv \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major \
      --sample_col Patient

# 5. Results visulization
   python declust.py --module visualize \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad

📁 Output Structure

   project/
      ├── data/
      ├── sc_adata_overlapped.h5ad
      ├── sc_labels.csv
      └── ...
      ├── results/
      ├── marker_genes.csv
      ├── srg_df.csv
      ├── pseudo_bulk.csv
      ├── DECLUST_result.csv
      └── [visualization plots]

License

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

declust-1.0.1.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

declust-1.0.1-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file declust-1.0.1.tar.gz.

File metadata

  • Download URL: declust-1.0.1.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bfcff67cf7009996a6c5c80ca17c7037364219d6839f9456c8d9a71bad3f5d78
MD5 8b7abc7529c244438c7a66c79944a026
BLAKE2b-256 b474298916dd09c5c5f939d8480a8ae5890ccff52ac2fda6b97e3daca84473f4

See more details on using hashes here.

File details

Details for the file declust-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: declust-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ce47274579bb61032ad12759c7c41edb2b0fe9230f36b3304510e5d5cb3f2ae
MD5 b79e8d885713292a10b8f2b7dcecd29e
BLAKE2b-256 61cf77f213ed0911a6daef86038c0103a4f533038e83d044255eb27f8e6160f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page