Skip to main content

A cluster-based cell-type deconvolution of spatial transcriptomic data (DECLUST)

Project description

DECLUST is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.


🌟 Features

Spatially-aware clustering: Combines gene expression and spatial coordinates.

Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.

Easy to install: Available via pip.

Visualization: Includes modules for visualizing clustering and marker gene expression.

⏬ Installation

We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.

  • Create a conda environment and install the DECLUST package
   conda create -n declust_env python=3.9
   conda activate declust_env

   pip install declust
  • Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the install_dependencies.sh script:
   sh install_dependencies.sh

The DECLUST package has been installed successfully on Operating systems:

  • macOS Sequoia 15.3.2
  • SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)

📊 Data Input

DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.

Each .h5ad file includes:

sc_adata.h5ad (Single-cell RNA-seq data)

  • .X: Gene expression matrix (cells × genes)
  • .obs: Cell type annotation of single cells

st_adata.h5ad (Spatial transcriptomics data)

  • .X: Spatial gene expression matrix (spots × genes)
  • .obs: Spots coordinates

💡 Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.

🔗 Example Data Download

⚙️ Usage

DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.

Run the pipeline using the following command:

python declust.py --module <module_name> [other options]
  • Available Modules
Module Description
marker Construction of Reference Matrix from Annotated Single-Cell Transcriptomic Data
cluster Identification of spatial clusters of spots from ST data
pseudo_bulk Generate pseudo-bulk ST profiles per cluster
deconv Run deconvolution by Ordinary Least Squares
visualize Visualize markers or deconvolution results

Type python declust.py --help in the terminal to see a list of available commands.

🧬 DECLUST pipeline

  1. Download DECLUST:
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
  1. Unpack data:
   cd DECLUST-0.1.1
   unzip data.zip
  1. Marker gene selection:
   python declust.py --module marker \
   --celltype_col \
   --sample_col

Outputs:

  • sc_data_overlapped.csv and sc_label.csv in the data/ folder

  • marker_genes.csv in the results/ folder

  1. Clustering:
   python declust.py --module cluster

Performs Hierarchical Clustering → DBSCAN → Seeded Region Growing (SRG). Saves:

  • srg_df.csv and clustering plots in results/
  1. Deconvolution:
   python declust.py --module deconv

Performs OLS-based deconvolution and outputs:

  • DECLUST_result.csv in results/

You can run each step individually or execute the entire pipeline by running the deconvolution script.

To export pseudo-bulk profiles for external methods:

   python declust.py --module pseduo_bulk
  • Generates pseudo_bulk.csv in the results/ folder.

💡 Custom Marker Genes

Users can provide their own marker gene list in one of two formats:

  • CSV file containing two columns:
    • Gene: gene names
    • maxgroup: corresponding cell type annotations
   --custom_marker_genes file_path
  • Comma-separated gene list, along with a corresponding comma-separated list of cell types:
   --custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
   --custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."

⚠️ The provided marker genes and cell type annotations must exist in the single-cell dataset.

📬 Quick example to run DECLUST on a simulated data

# 1. Download DECLUST
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
   cd DECLUST-0.1.1

# 2. Configuring environment and install dependencies
   conda create -n declust_env python=3.9
   conda activate declust_env
   pip install declust
   sh install_dependencies.sh

# 3. Download and unpack simulated data
   wget "https://drive.usercontent.google.com/download?id=1VY_vIuZalCBe2IhNCNBSQwo5m5Da8aFw&export=download&authuser=0&confirm=t&uuid=93730baf-2a12-49d7-b475-ab715a3644c3&at=APcmpow759exSs6opQk4zSMVbjXf%3A1744370330609" -O simulation_data.zip
   unzip simulation_data.zip

# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
   python declust.py --module deconv \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major \
      --sample_col Patient

# 5. Results visulization
   python declust.py --module visualize \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad

📁 Output Structure

   project/
      ├── data/
      ├── sc_adata_overlapped.h5ad
      ├── sc_labels.csv
      └── ...
      ├── results/
      ├── marker_genes.csv
      ├── srg_df.csv
      ├── pseudo_bulk.csv
      ├── DECLUST_result.csv
      └── [visualization plots]

License

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

declust-0.1.8.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

declust-0.1.8-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file declust-0.1.8.tar.gz.

File metadata

  • Download URL: declust-0.1.8.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-0.1.8.tar.gz
Algorithm Hash digest
SHA256 9b4cc8ab3a4e75c4edfc3edd0b996fee383f49867b773c7f790730305fe292a6
MD5 eef980b09f30bea7d5c41135aa83c52b
BLAKE2b-256 d9395524e61121dcc63afae2ecb79fc83cf760662bd374981cb10781efab759a

See more details on using hashes here.

File details

Details for the file declust-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: declust-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for declust-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9015f233c0c38731263ca124fc7dc4883732dfcb9cae9635969728ac24c1390e
MD5 3bc7e41417006db245fdb10d22abeafc
BLAKE2b-256 f3465d897db644db54f0ec4839d0cc21947421b3fb6f6ef4ad042c69213120b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page