Skip to main content

Cell type deconvolution using single cell data

Project description

Scaden

Single-cell assisted deconvolutional network

Scaden is a deep-learning based algorithm for cell type deconvolution of bulk RNA-seq samples. It was developed at the DZNE Tübingen and the ZMNH in Hamburg. A pre-print describing the method is available at Biorxiv: Deep-learning based cell composition analysis from tissue expression profiles

A complete documentation is available here

Figure1

Scaden overview. a) Generation of artificial bulk samples with known cell type composition from scRNA-seq data. b) Training of Scaden model ensemble on simulated training data. c) Scaden ensemble architecture. d) A trained Scaden model can be used to deconvolve complex bulk mixtures.

1. System requirements

Scaden was developed and tested on Linux (Ubuntu 16.04 and 18.04). It was not tested on Windows or Mac, but should also be usable on these systems when installing with Pip or Bioconda. Scaden does not require any special hardware (e.g. GPU), however we recommend to have at least 16 GB of memory.

Scaden requires Python 3. All package dependencies should be handled automatically when installing with pip or conda.

2. Installation guide

The recommended way to install Scaden is using conda and the Bioconda channel:

conda install -c bioconda scaden

Instllation with conda takes only a few minutes (2-5), depending on the internet connetion. Alternatively Scaden can be installed with pip:

pip install scaden

We also provide a docker image with Scaden installed: https://hub.docker.com/r/kevinmenden/scaden

3. Demo

We provide several curated training datasets for Scaden. For this demo, we will use the human PBMC training dataset, which consists of 4 different scRNA-seq datasets and 32,000 samples in total. You can download it here: https://figshare.com/s/e59a03885ec4c4d8153f.

For this demo, you will also need to download some test samples to perform deconvolution on, along with their associated labels. You can download the data we used for the Scaden paper here: https://figshare.com/articles/Publication_Figures/8234030

We'll perform deconvolution on simulated samples from the data6k dataset. You can find the samples and labels in 'paper_data/figures/figure2/data/data6k_500_*' once you have downloaded this data from the link mentioned above.

The first step is to perform preprocessing on the training data. This is done with the following command:

scaden process pbmc_data.h5ad paper_data/figures/figure2/data/data6k_500_samples.txt

This will generate a file called 'processed.h5ad', which we will use for training. The training data we have downloaded also contains samples from the data6k scRNA-seq dataset, so we have to exclude them from training to get a meaningfull test of Scaden's performance. The following command will train a Scaden ensemble for 5000 steps per model (recommended), and store it in 'scaden_model'. Data from the data6k dataset will be excluded from training. Depending on your machine, this can take about 10-20 minutes.

scaden train processed.h5ad --steps 5000 --model_dir scaden_model --train_datasets 'data8k donorA donorC'

Finally, we can perform deconvolution on the 500 simulates samples from the data6k dataset:

scaden predict paper_data/figures/figure2/data/data6k_500_samples.txt --model_dir scaden_model

This will create a file named 'cdn_predictions.txt' (will be renamed in future version to 'scaden_predictions.txt'), which contains the deconvolution results. You can now compare these predictions with the true values contained in 'paper_data/figures/figure2/data/data6k_500_labels.txt'. This should give you the same results as we obtained in the Scaden paper (see Figure 2).

4. Instructions for use

For a general description on how to use Scaden, please check out our usage documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaden-0.9.3.1.post0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

scaden-0.9.3.1.post0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file scaden-0.9.3.1.post0.tar.gz.

File metadata

  • Download URL: scaden-0.9.3.1.post0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for scaden-0.9.3.1.post0.tar.gz
Algorithm Hash digest
SHA256 c2d4fb291454c922fecc201e41a6f249015ab2efac49ecc5e63bcbc099014e3a
MD5 09ff41ed54380502dcb6ea0a8abdb160
BLAKE2b-256 5b87681cb98830a1c3b1edc04271390dce0558099df463bcef86eb552a81e162

See more details on using hashes here.

File details

Details for the file scaden-0.9.3.1.post0-py3-none-any.whl.

File metadata

  • Download URL: scaden-0.9.3.1.post0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for scaden-0.9.3.1.post0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca745d39ab730ee562e932a087822f6359aea78b2b4b02d1ec23c548d980a686
MD5 47df03e9be93dcbe5c8fa368b3ad638c
BLAKE2b-256 eaeb92a0cc3ed70df708004a30011689d4b65761bc5901c58444a8fd1f253a91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page