Skip to main content

scDiffusion-X: Diffusion Model for Single-Cell Multiome Data Generation and Analysis

Project description

scDiffusion-X: Diffusion Model for Single-Cell Multiome Data Generation and Analysis

Welcome! This is the official implement of scDiffusion-X.

TODO: introduction to scDiffusion-X

Installation

conda create --name scmuldiff python=3.8
pip install -r requirements.txt
pip install scdiffusionX
conda install mpi4py

User guidance

Step1: Train the Autoencoder

cd script/training_autoencoder
bash train_autoencoder_multimodal.sbatch

Adjust the data path to your local path. The dataset config file is in script/training_autoencoder/configs/dataset, see the comments in openproblem.yaml for details. The checkpoint will be saved in script/training_autoencoder/outputs/checkpoints and the log file will be saved in script/training_autoencoder/outputs/logs. The autoencoder config file is in script/training_autoencoder/configs/encoder, see the comments in encoder_multimodal.yaml for details.

We recommand to use encoder_multimodal for most of dataset. If the genes and peaks are more than 50,000 and 200,000, we recommand a larger autoencoder in encoder_multimodal_large. If the genes and peaks are less than 5,000 and 15,000, we recommand a smaller autoencoder in encoder_multimodal_small. The norm_type in the encoder config yaml control the normalization type. For data generation task, we recommend batch_norm, and for translation task, we recommend layer_norm since it has better generalization for OOD data.

Step2: Train the Diffusion Backbone

cd script/training_diffusion
sh ssh_scripts/multimodal_train.sh

Again, adjust the data path and output path to your own, and also change the ae_path&encoder_config to the autoencoder you tarined in step 1. When training with condition (like the cell type condition), set the num_class to the number of unique labels. The training is unconditional when the num_class is not set.

TODO: Explain more about each attribution

Step3: Generate new data

cd script/training_diffusion
sh ssh_scripts/multimodal_sample.sh

Change the MULTIMODAL_MODEL_PATH to the checkpoint path in step 2, and the DATA_DIR to your local data path.

The experiments results in the paper can be reproduce through evaluate_script/inference_multi_diff.ipynb

TODO: More details about the hyperpara, conditional and unconditional

Founction: Modality translation

For this task, we recommend you use layer_norm instead of batch_norm since it fit more for the OOD data. And if your source modality doesn't have a condition label overlap with the training data (like a external dataset), you can use unconditional training to train the model. If so, use a clustering method like leiden to get the cluster label as the covariate_keys for encoder (to get the size factor).

cd script/training_diffusion
sh ssh_scripts/multimodal_train_translation.sh
sh ssh_scripts/multimodal_translation.sh

You need to change the file path in both bash file to your local path. The GEN_MODE is the target modality (either "rna" or "atac" for current model). The training logic is the same for the multimodal_train_translation.sh and multimodal_train.sh except the dataset and other hyperparameters.

The experiments results in the paper can be reproduce through evaluate_script/translation_multi_diff.ipynb

TODO: change the format of input data file. More explaination about the hyperparameters and setting.

Founction: Gene-Peak regulatory analysis

You need to first complete the step1 and step2. The detail implement can be found in evaluate_script/regulatory_multi_diff.ipynb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scdiffusionx-0.0.2.tar.gz (352.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scdiffusionx-0.0.2-py3-none-any.whl (126.7 kB view details)

Uploaded Python 3

File details

Details for the file scdiffusionx-0.0.2.tar.gz.

File metadata

  • Download URL: scdiffusionx-0.0.2.tar.gz
  • Upload date:
  • Size: 352.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.13

File hashes

Hashes for scdiffusionx-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9118284f5db363f28fe8afabfb27913a8da738928fe75d7e765b2c4272cd43a7
MD5 c686520ea80c65e1d034ad1e8a5c24d7
BLAKE2b-256 3af17d5fe84a44b104a939bf60b8a066276e286e4cd4cd8635973e9363921fbd

See more details on using hashes here.

File details

Details for the file scdiffusionx-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: scdiffusionx-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 126.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.13

File hashes

Hashes for scdiffusionx-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 40114912814b865fb6fffe0b259e56b0d7007de623e728d1a3ac9e3bdb34d17a
MD5 5f14815cb1dcbb29553b9dc29e917167
BLAKE2b-256 fde639ef44504027ce54098fcccc48e57eb02f46a43bda069649650a568a03f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page