Skip to main content

A cluster-based cell-type deconvolution of spatial transcriptomic data (DECLUST)

Project description

DECLUST is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.


🌟 Features

Spatially-aware clustering: Combines gene expression and spatial coordinates.

Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.

Easy to install: Available via pip.

Visualization: Includes modules for visualizing clustering and marker gene expression.

⏬ Installation

We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.

  • Create a conda environment and install the DECLUST package
   conda create -n declust_env python=3.9
   conda activate declust_env

   pip install declust
  • Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the install_dependencies.sh script:
   sh install_dependencies.sh

The DECLUST package has been installed successfully on Operating systems:

  • macOS Sequoia 15.3.2
  • SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)

📊 Data Input

DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.

Each .h5ad file includes:

sc_adata.h5ad (Single-cell RNA-seq data)

  • .X: Gene expression matrix (cells × genes)
  • .obs: Cell type annotation of single cells

st_adata.h5ad (Spatial transcriptomics data)

  • .X: Spatial gene expression matrix (spots × genes)
  • .obs: Spots coordinates

💡 Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.

🔗 Example Data Download

⚙️ Usage

DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.

Run the pipeline using the following command:

python declust.py --module <module_name> [other options]
  • Available Modules
Module Description
marker Construction of Reference Matrix from Annotated Single-Cell Transcriptomic Data
cluster Identification of spatial clusters of spots from ST data
pseudo_bulk Generate pseudo-bulk ST profiles per cluster
deconv Run deconvolution by Ordinary Least Squares
visualize Visualize markers or deconvolution results

Type python declust.py --help in the terminal to see a list of available commands.

🧬 DECLUST pipeline

  1. Download DECLUST:
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
  1. Unpack data:
   cd DECLUST-0.1.1
   unzip data.zip
  1. Marker gene selection:
   python declust.py --module marker \
   --celltype_col \
   --sample_col

Outputs:

  • sc_data_overlapped.csv and sc_label.csv in the data/ folder

  • marker_genes.csv in the results/ folder

  1. Clustering:
   python declust.py --module cluster

Performs Hierarchical Clustering → DBSCAN → Seeded Region Growing (SRG). Saves:

  • srg_df.csv and clustering plots in results/
  1. Deconvolution:
   python declust.py --module deconv

Performs OLS-based deconvolution and outputs:

  • DECLUST_result.csv in results/

You can run each step individually or execute the entire pipeline by running the deconvolution script.

To export pseudo-bulk profiles for external methods:

   python declust.py --module pseduo_bulk
  • Generates pseudo_bulk.csv in the results/ folder.

💡 Custom Marker Genes

Users can provide their own marker gene list in one of two formats:

  • CSV file containing two columns:
    • Gene: gene names
    • maxgroup: corresponding cell type annotations
   --custom_marker_genes file_path
  • Comma-separated gene list, along with a corresponding comma-separated list of cell types:
   --custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
   --custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."

⚠️ The provided marker genes and cell type annotations must exist in the single-cell dataset.

📬 Quick example to run DECLUST on a simulated data

# 1. Download DECLUST
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
   cd DECLUST-0.1.1

# 2. Configuring environment and install dependencies
   conda create -n declust_env python=3.9
   conda activate declust_env
   pip install declust
   sh install_dependencies.sh

# 3. Download and unpack simulated data
   wget "https://drive.usercontent.google.com/download?id=1VY_vIuZalCBe2IhNCNBSQwo5m5Da8aFw&export=download&authuser=0&confirm=t&uuid=93730baf-2a12-49d7-b475-ab715a3644c3&at=APcmpow759exSs6opQk4zSMVbjXf%3A1744370330609" -O simulation_data.zip
   unzip simulation_data.zip

# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
   python declust.py --module deconv \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major \
      --sample_col Patient

# 5. Results visulization
   python declust.py --module visualize \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad

📁 Output Structure

   project/
      ├── data/
      ├── sc_adata_overlapped.h5ad
      ├── sc_labels.csv
      └── ...
      ├── results/
      ├── marker_genes.csv
      ├── srg_df.csv
      ├── pseudo_bulk.csv
      ├── DECLUST_result.csv
      └── [visualization plots]

License

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

declust-1.0.2.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

declust-1.0.2-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file declust-1.0.2.tar.gz.

File metadata

  • Download URL: declust-1.0.2.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6c6e12ee5ff2bf849a4469e2b71e36a8f236ff10af6e6b577710f00a72dc43f8
MD5 bea8f56d7ae237225ad7b6458f485861
BLAKE2b-256 68caa803d4b58df1f0aac205d3bb1ff7ab76976477445744768da4262e8bb94e

See more details on using hashes here.

File details

Details for the file declust-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: declust-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0d15be75770f9639974696207abb8a0f93fb974a045c18d349773a69c733092
MD5 f72eb260a5da1e16cedf5de680e19a59
BLAKE2b-256 401a6db5c635a8d0eb19873ba3be016f13a7964148b36b3f6b2dd21266861c69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page