Skip to main content

A torch-based integration method for single-cell multi-omic data.

Project description

MIDAS: A Deep Generative Model for Mosaic Integration and Knowledge Transfer of Single-Cell Multimodal Data

MIDAS Logo

MIDAS turns raw mosaic data into both imputed, batch-corrected data and disentangled latent representations, powering robust downstream analysis.

GitHub Stars PyPI version Documentation Status License


MIDAS is a powerful deep probabilistic framework designed for the mosaic integration and knowledge transfer of single-cell multimodal data. It addresses key challenges in single-cell analysis, such as modality alignment, batch effect removal, and data imputation. By leveraging self-supervised modality alignment and information-theoretic latent disentanglement, MIDAS transforms fragmented, mosaic data into a complete and harmonized dataset ready for downstream analysis.

Whether you are working with transcriptomics (RNA), proteomics (ADT), or chromatin accessibility (ATAC), MIDAS provides a versatile solution to uncover deeper biological insights from complex, multi-source datasets.

✨ Key Features

  • Mosaic Data Integration: Seamlessly integrates datasets where different batches measure different sets of modalities (e.g., some samples have RNA and ATAC, while others have only RNA).
  • Multi-Modal Support: Natively supports RNA, ADT, and ATAC data, and can be easily configured to incorporate additional modalities.
  • Data Imputation: Accurately imputes missing modalities, turning incomplete data into a complete multi-modal matrix.
  • Batch Correction: Effectively removes technical variations between different batches, enabling consistent and reliable analysis across datasets.
  • Knowledge Transfer: Leverages a pre-trained reference atlas to enable flexible and accurate knowledge transfer to new query datasets.
  • Efficient and Scalable: Built on PyTorch Lightning for highly efficient model training, with support for advanced strategies like Distributed Data Parallel (DDP).
  • Advanced Visualization: Integrates with TensorBoard for real-time monitoring of training loss and UMAP visualizations.

🚀 Installation

Get started with MIDAS by setting up a conda environment.

# 1. Create and activate a new conda environment
conda create -n scmidas python=3.12
conda activate scmidas

# 2. Install MIDAS from PyPI
pip install scmidas==0.1.16

⚡ Getting Started: A Quick Example

Here is a minimal example to get you started with a mosaic integration task. For more detailed tutorials, please refer to our documentation.

from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L

# 1. Configure and initialize the MIDAS model
# The configuration file allows you to specify modalities, layers, and other parameters.
configs = load_config()

# 2. Load your mosaic dataset
# The input should be an AnnData object where modalities are stored.
# Different batches can have different combinations of modalities.
model = MIDAS.configure_data_from_dir(configs, 'path/to/your/data', transform={'atac':'binarize'})

# 3. Train the model on your data
model.train(max_epochs=2000)

# 4. Obtain the integrated and imputed results
# The model returns an AnnData object with a unified latent space 
# and imputed values for the missing modalities.
pred = model.predict()

# 5. Visualize the results
model.get_emb_umap()

⚡ Update: Load data from MuData

In addition to loading data from a directory, MIDAS also supports direct initialization from a MuData object. This is useful when your multimodal dataset is already organized in memory with modality-specific AnnData objects.

A typical MuData object may look like this:

# Example MuData:
# MuData object with n_obs × n_vars = 10000 × 1200
#   2 modalities
#     rna: 10000 x 1000
#       obs: 'batch'
#       uns: 'mask_batch1', 'mask_batch2'
#     adt: 8000 x 200
#       obs: 'batch'
#       uns: 'mask_batch1', 'mask_batch2'

You can configure the model from MuData as follows:

from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L

# 1. Load model configuration
configs = load_config()

# 2. Prepare your MuData object
# Assume `mdata` is already loaded in memory.
# Each modality should be stored in mdata.mod, for example:
#   mdata.mod['rna']
#   mdata.mod['adt']
#
# The `batch_key` specifies the column in .obs that indicates batch membership.
# The `dims_x` argument defines the input feature dimension for each modality.
model = MIDAS.configure_data_from_mdata(
    mdata=mdata,
    batch_key='batch',
    dims_x={
        'rna': [1000],
        'adt': [200],
    },
    configs=configs
)

# 3. Train the model
model.train(max_epochs=2000)

# 4. Run prediction
pred = model.predict()

📈 Reproducibility

To reproduce the results from our publication, please visit the reproducibility branch of this repository: github.com/labomics/midas/tree/reproducibility

📜 Citation

If you use MIDAS in your research, please cite our paper:

He, Z., Hu, S., Chen, Y. et al. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02040-y

@article{he2024mosaic,
  title={Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS},
  author={He, Zhen and Hu, Shuofeng and Chen, Yaowen and An, Sijing and Zhou, Jiahao and Liu, Runyan and Shi, Junfeng and Wang, Jing and Dong, Guohua and Shi, Jinhui and others},
  journal={Nature Biotechnology},
  pages={1--12},
  year={2024},
  publisher={Nature Publishing Group US New York}
}

🙌 Contributing

We welcome contributions from the community! If you have a suggestion, bug report, or want to contribute to the code, please feel free to open an issue or submit a pull request.

📝 License

MIDAS is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scmidas-0.1.16.tar.gz (51.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scmidas-0.1.16-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file scmidas-0.1.16.tar.gz.

File metadata

  • Download URL: scmidas-0.1.16.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.8

File hashes

Hashes for scmidas-0.1.16.tar.gz
Algorithm Hash digest
SHA256 9b1a0ef8ca17f36ba30be8061be7e8e4d937e3d54de411eb2783e126b0749589
MD5 99417b6c2ffbe5fc538c05426cf7bfee
BLAKE2b-256 cf7740a154963ce4a0920a6effa263f397a8abea014c35bdc6d3881ebb4e0853

See more details on using hashes here.

File details

Details for the file scmidas-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: scmidas-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.8

File hashes

Hashes for scmidas-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 a16d2ca8463b6a3a05903431def0ea810c2dd984fef253c8d076c760c27640ec
MD5 78bc365f0021e4c00fdf4b430a21eb20
BLAKE2b-256 ddba7f0ac8c848e35c9645b577b364cef9c7f57f66a4b3f50da40c85c9d64a8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page