scDiffusion-X: Diffusion Model for Single-Cell Multiome Data Generation and Analysis
Project description
scDiffusion-X: Diffusion Model for Single-Cell Multiome Data Generation and Analysis
Welcome! This is the official implement of scDiffusion-X.
TODO: introduction to scDiffusion-X
Installation
conda create --name scmuldiff python=3.8
pip install -r requirements.txt
pip install scdiffusionX
conda install mpi4py
User guidance
Step1: Train the Autoencoder
cd script/training_autoencoder
bash train_autoencoder_multimodal.sbatch
Adjust the data path to your local path. The dataset config file is in script/training_autoencoder/configs/dataset, see the comments in openproblem.yaml for details. The checkpoint will be saved in script/training_autoencoder/outputs/checkpoints and the log file will be saved in script/training_autoencoder/outputs/logs. The autoencoder config file is in script/training_autoencoder/configs/encoder, see the comments in encoder_multimodal.yaml for details.
We recommand to use encoder_multimodal for most of dataset. If the genes and peaks are more than 50,000 and 200,000, we recommand a larger autoencoder in encoder_multimodal_large. If the genes and peaks are less than 5,000 and 15,000, we recommand a smaller autoencoder in encoder_multimodal_small. The norm_type in the encoder config yaml control the normalization type. For data generation task, we recommend batch_norm, and for translation task, we recommend layer_norm since it has better generalization for OOD data.
Step2: Train the Diffusion Backbone
cd script/training_diffusion
sh ssh_scripts/multimodal_train.sh
Again, adjust the data path and output path to your own, and also change the ae_path&encoder_config to the autoencoder you tarined in step 1. When training with condition (like the cell type condition), set the num_class to the number of unique labels. The training is unconditional when the num_class is not set.
TODO: Explain more about each attribution
Step3: Generate new data
cd script/training_diffusion
sh ssh_scripts/multimodal_sample.sh
Change the MULTIMODAL_MODEL_PATH to the checkpoint path in step 2, and the DATA_DIR to your local data path.
The experiments results in the paper can be reproduce through evaluate_script/inference_multi_diff.ipynb
TODO: More details about the hyperpara, conditional and unconditional
Founction: Modality translation
For this task, we recommend you use layer_norm instead of batch_norm since it fit more for the OOD data. And if your source modality doesn't have a condition label overlap with the training data (like a external dataset), you can use unconditional training to train the model. If so, use a clustering method like leiden to get the cluster label as the covariate_keys for encoder (to get the size factor).
cd script/training_diffusion
sh ssh_scripts/multimodal_train_translation.sh
sh ssh_scripts/multimodal_translation.sh
You need to change the file path in both bash file to your local path. The GEN_MODE is the target modality (either "rna" or "atac" for current model). The training logic is the same for the multimodal_train_translation.sh and multimodal_train.sh except the dataset and other hyperparameters.
The experiments results in the paper can be reproduce through evaluate_script/translation_multi_diff.ipynb
TODO: change the format of input data file. More explaination about the hyperparameters and setting.
Founction: Gene-Peak regulatory analysis
You need to first complete the step1 and step2. The detail implement can be found in evaluate_script/regulatory_multi_diff.ipynb
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scdiffusionx-0.0.2.tar.gz.
File metadata
- Download URL: scdiffusionx-0.0.2.tar.gz
- Upload date:
- Size: 352.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9118284f5db363f28fe8afabfb27913a8da738928fe75d7e765b2c4272cd43a7
|
|
| MD5 |
c686520ea80c65e1d034ad1e8a5c24d7
|
|
| BLAKE2b-256 |
3af17d5fe84a44b104a939bf60b8a066276e286e4cd4cd8635973e9363921fbd
|
File details
Details for the file scdiffusionx-0.0.2-py3-none-any.whl.
File metadata
- Download URL: scdiffusionx-0.0.2-py3-none-any.whl
- Upload date:
- Size: 126.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40114912814b865fb6fffe0b259e56b0d7007de623e728d1a3ac9e3bdb34d17a
|
|
| MD5 |
5f14815cb1dcbb29553b9dc29e917167
|
|
| BLAKE2b-256 |
fde639ef44504027ce54098fcccc48e57eb02f46a43bda069649650a568a03f5
|