Skip to main content

A spatial transcriptomics deconvolution tool for cell type identification and gene expression analysis

Project description

CITEgeist: Cellular Indexing of Transcriptomes and Epitopes for Guided Exploration of Intrinsic Spatial Trends

CITEgeist is a computational method for deconvolving spatial transcriptomics data using spatially-resolved CITE-seq measurements. The pipeline performs both cell-type proportion estimation and gene expression deconvolution in a two-pass approach, leveraging both protein and RNA measurements from the same spatial locations.

Quick Installation

You can now also install CITEgeist using pip:

pip install citegeist

Table of Contents

  1. System Requirements
  2. Getting Started
  3. Benchmarking and Reproducibility
  4. Run the Analysis

System Requirements

Software Dependencies

  • Operating System:

    • Linux
    • macOS
    • Windows 10 with WSL2
  • Python: 3.10

  • Gurobi version > 3.9

Key Python Dependencies

  • scanpy==1.10.4
  • anndata==0.11.3
  • numpy==1.26.4
  • pandas==2.2.3
  • scipy==1.13.1
  • scikit-learn==1.6.1
  • gurobipy==11.0.2 (requires license)
  • matplotlib==3.10.0
  • seaborn==0.13.2
  • h5py==3.12.1
  • squidpy==1.6.2
  • spatialdata==0.2.5.post0

It is recommended to install the dependencies in the CITEgeist_env.yml file for running the notebooks.

Hardware Requirements

  • RAM: Minimum 16GB, Recommended 64GB+
  • Storage: 16GB minimum for installation and basic analysis
  • CPU: Multi-core processor recommended (8+ cores for optimal performance)

Getting Started

1. Download the Code and Data (Instructions for Peer Reviewers)

  1. Download the code from Figshare: https://figshare.com/s/34e456fd7786e5211acc
  2. Unzip the downloaded file to your preferred location.
  3. Download the data from GEO (Reviewers: see Data Availability section in the manuscript for the GEO link and Private Access Token).
  4. Run the following code to strip unique identifiers required by GEO:
# go to the 'data' directory
cd data

# untar the raw files
mkdir -pv ./GEO_data
tar -xvf GEO_data_RAW.tar -C ./GEO_data

# run the py preprocessing script
## round 1) aggregate the files by sample
python3 ./delete_all_but_essential.py --folder GEO_data # select option: 1

## round 2) remove the prefix from necessary files
python3 ./delete_all_but_essential.py --folder GEO_data # select option: 2

Note: When prompted, select Option 1 or 2 and type 'Yes' to confirm.

2. Set Up the Environment

  • Install dependencies using the provided environment file:
conda env create -f CITEgeist_env.yml
  • Activate the environment and set up a Jupyter kernel:
conda activate CITEgeist_env

3. Obtain Gurobi License

CITEgeist requires a Gurobi license (free for academic use):

  1. Sign up for an academic license at: https://www.gurobi.com/downloads/end-user-license-agreement-academic/
  2. Follow the instructions to download and install your license.
  3. Update the license file path in the notebooks to match your local license location.

4. Running CITEgeist

You can run CITEgeist in two ways:

A. Using Jupyter Notebooks

  • Update data paths in the top of the notebooks to match your local directory structure.
  • Expected runtime on a standard computer (16 threads, 32GB RAM):
    • Vignette 1: ~2 hours
    • Vignette 2: ~2 hours
    • Vignette 3: ~10 hours

Key Parameters:

  • radius: Radius for neighbor detection (default: 4)
  • lambda_reg: Regularization strength for cell proportion estimation (default: 0.001)
  • alpha_elastic: Elastic net mixing parameter for cell proportion estimation (default: 0.7)
  • max_y_change: Maximum allowed change in Y values (default: 0.2)

Optional Parameters:

  • profiling_only: Set for cell-type proportions only.
  • max_workers: Number of parallel workers.
  • checkpoint_interval: Checkpoint saving interval.

B. Using SLURM Distribution

For large-scale analyses, you can use the provided CITEgeist/examples/sbatch_sample.sh script for distributed computing.


Benchmarking and Reproducibility

For specific reproduction of benchmarking tests and detailed methodology, please refer to the 'examples' and 'benchmarking' section in the documentation.


Run the Analysis

You can either:

A. Run the notebooks directly:

  • Update data paths in the notebooks to match your local directory structure.
  • Expected runtime: ~2 hours on a standard computer (16 threads, 32GB RAM).

B. Use SLURM distribution:

  • Use the provided examples/sbatch_sample.sh script for distributed computing.

Additional System Requirements

  • RAM: 32GB (minimum)
  • CPU: 16 threads (recommended)
  • Storage: Sufficient space for the GEO dataset
  • Operating System: Linux/Unix recommended (Windows users may need additional configuration)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citegeist-0.1.0.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citegeist-0.1.0-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file citegeist-0.1.0.tar.gz.

File metadata

  • Download URL: citegeist-0.1.0.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for citegeist-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae8ebf0716731a1206384c9f92dbe9ffef4d4c6838fee3dc87285399a063ff18
MD5 4f24f4703691204e14606a72ac2e2105
BLAKE2b-256 5b6d0ef13d5b93b1dfd154c8bdd1275e5e6c2ac12bd0096f8198f5a73ab20f41

See more details on using hashes here.

File details

Details for the file citegeist-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: citegeist-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for citegeist-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49c3386d89125e13f0fed0b3f5ed637ccc0112f0c22227a4511e015e741a6cac
MD5 d57ae090893ba3a28c816ee40c34b1fa
BLAKE2b-256 b32982871f35f99ba7469fc25faaf055c433938d5521afca8bbaa4a5efd7db4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page