A torch-based integration method for single-cell multi-omic data.
Project description
MIDAS: A Deep Generative Model for Mosaic Integration and Knowledge Transfer of Single-Cell Multimodal Data
MIDAS turns raw mosaic data into both imputed, batch-corrected data and disentangled latent representations, powering robust downstream analysis.
MIDAS is a powerful deep probabilistic framework designed for the mosaic integration and knowledge transfer of single-cell multimodal data. It addresses key challenges in single-cell analysis, such as modality alignment, batch effect removal, and data imputation. By leveraging self-supervised modality alignment and information-theoretic latent disentanglement, MIDAS transforms fragmented, mosaic data into a complete and harmonized dataset ready for downstream analysis.
Whether you are working with transcriptomics (RNA), proteomics (ADT), or chromatin accessibility (ATAC), MIDAS provides a versatile solution to uncover deeper biological insights from complex, multi-source datasets.
- Documentation: scmidas.readthedocs.io
- Publication: Nature Biotechnology
✨ Key Features
- Mosaic Data Integration: Seamlessly integrates datasets where different batches measure different sets of modalities (e.g., some samples have RNA and ATAC, while others have only RNA).
- Multi-Modal Support: Natively supports RNA, ADT, and ATAC data, and can be easily configured to incorporate additional modalities.
- Data Imputation: Accurately imputes missing modalities, turning incomplete data into a complete multi-modal matrix.
- Batch Correction: Effectively removes technical variations between different batches, enabling consistent and reliable analysis across datasets.
- Knowledge Transfer: Leverages a pre-trained reference atlas to enable flexible and accurate knowledge transfer to new query datasets.
- Efficient and Scalable: Built on PyTorch Lightning for highly efficient model training, with support for advanced strategies like Distributed Data Parallel (DDP).
- Advanced Visualization: Integrates with TensorBoard for real-time monitoring of training loss and UMAP visualizations.
🚀 Installation
Get started with MIDAS by setting up a conda environment.
# 1. Create and activate a new conda environment
conda create -n scmidas python=3.12
conda activate scmidas
# 2. Install MIDAS from PyPI
pip install scmidas==0.1.16
⚡ Getting Started: A Quick Example
Here is a minimal example to get you started with a mosaic integration task. For more detailed tutorials, please refer to our documentation.
from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L
# 1. Configure and initialize the MIDAS model
# The configuration file allows you to specify modalities, layers, and other parameters.
configs = load_config()
# 2. Load your mosaic dataset
# The input should be an AnnData object where modalities are stored.
# Different batches can have different combinations of modalities.
model = MIDAS.configure_data_from_dir(configs, 'path/to/your/data', transform={'atac':'binarize'})
# 3. Train the model on your data
model.train(max_epochs=2000)
# 4. Obtain the integrated and imputed results
# The model returns an AnnData object with a unified latent space
# and imputed values for the missing modalities.
pred = model.predict()
# 5. Visualize the results
model.get_emb_umap()
⚡ Update: Load data from MuData
In addition to loading data from a directory, MIDAS also supports direct initialization from a MuData object. This is useful when your multimodal dataset is already organized in memory with modality-specific AnnData objects.
A typical MuData object may look like this:
# Example MuData:
# MuData object with n_obs × n_vars = 10000 × 1200
# 2 modalities
# rna: 10000 x 1000
# obs: 'batch'
# uns: 'mask_batch1', 'mask_batch2'
# adt: 8000 x 200
# obs: 'batch'
# uns: 'mask_batch1', 'mask_batch2'
You can configure the model from MuData as follows:
from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L
# 1. Load model configuration
configs = load_config()
# 2. Prepare your MuData object
# Assume `mdata` is already loaded in memory.
# Each modality should be stored in mdata.mod, for example:
# mdata.mod['rna']
# mdata.mod['adt']
#
# The `batch_key` specifies the column in .obs that indicates batch membership.
# The `dims_x` argument defines the input feature dimension for each modality.
model = MIDAS.configure_data_from_mdata(
mdata=mdata,
batch_key='batch',
dims_x={
'rna': [1000],
'adt': [200],
},
configs=configs
)
# 3. Train the model
model.train(max_epochs=2000)
# 4. Run prediction
pred = model.predict()
📈 Reproducibility
To reproduce the results from our publication, please visit the reproducibility branch of this repository:
github.com/labomics/midas/tree/reproducibility
📜 Citation
If you use MIDAS in your research, please cite our paper:
He, Z., Hu, S., Chen, Y. et al. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02040-y
@article{he2024mosaic,
title={Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS},
author={He, Zhen and Hu, Shuofeng and Chen, Yaowen and An, Sijing and Zhou, Jiahao and Liu, Runyan and Shi, Junfeng and Wang, Jing and Dong, Guohua and Shi, Jinhui and others},
journal={Nature Biotechnology},
pages={1--12},
year={2024},
publisher={Nature Publishing Group US New York}
}
🙌 Contributing
We welcome contributions from the community! If you have a suggestion, bug report, or want to contribute to the code, please feel free to open an issue or submit a pull request.
📝 License
MIDAS is available under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scmidas-0.1.16.tar.gz.
File metadata
- Download URL: scmidas-0.1.16.tar.gz
- Upload date:
- Size: 51.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b1a0ef8ca17f36ba30be8061be7e8e4d937e3d54de411eb2783e126b0749589
|
|
| MD5 |
99417b6c2ffbe5fc538c05426cf7bfee
|
|
| BLAKE2b-256 |
cf7740a154963ce4a0920a6effa263f397a8abea014c35bdc6d3881ebb4e0853
|
File details
Details for the file scmidas-0.1.16-py3-none-any.whl.
File metadata
- Download URL: scmidas-0.1.16-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a16d2ca8463b6a3a05903431def0ea810c2dd984fef253c8d076c760c27640ec
|
|
| MD5 |
78bc365f0021e4c00fdf4b430a21eb20
|
|
| BLAKE2b-256 |
ddba7f0ac8c848e35c9645b577b364cef9c7f57f66a4b3f50da40c85c9d64a8e
|