Skip to main content

A torch-based integration method for single-cell multi-omic data.

Project description

MIDAS: A Deep Generative Model for Mosaic Integration and Knowledge Transfer of Single-Cell Multimodal Data

MIDAS Logo

MIDAS turns raw mosaic data into both imputed, batch-corrected data and disentangled latent representations, powering robust downstream analysis.

GitHub Stars PyPI version Documentation Status License


MIDAS is a powerful deep probabilistic framework designed for the mosaic integration and knowledge transfer of single-cell multimodal data. It addresses key challenges in single-cell analysis, such as modality alignment, batch effect removal, and data imputation. By leveraging self-supervised modality alignment and information-theoretic latent disentanglement, MIDAS transforms fragmented, mosaic data into a complete and harmonized dataset ready for downstream analysis.

Whether you are working with transcriptomics (RNA), proteomics (ADT), or chromatin accessibility (ATAC), MIDAS provides a versatile solution to uncover deeper biological insights from complex, multi-source datasets.

✨ Key Features

  • Mosaic Data Integration: Seamlessly integrates datasets where different batches measure different sets of modalities (e.g., some samples have RNA and ATAC, while others have only RNA).
  • Multi-Modal Support: Natively supports RNA, ADT, and ATAC data, and can be easily configured to incorporate additional modalities.
  • Data Imputation: Accurately imputes missing modalities, turning incomplete data into a complete multi-modal matrix.
  • Batch Correction: Effectively removes technical variations between different batches, enabling consistent and reliable analysis across datasets.
  • Knowledge Transfer: Leverages a pre-trained reference atlas to enable flexible and accurate knowledge transfer to new query datasets.
  • Efficient and Scalable: Built on PyTorch Lightning for highly efficient model training, with support for advanced strategies like Distributed Data Parallel (DDP).
  • Advanced Visualization: Integrates with TensorBoard for real-time monitoring of training loss and UMAP visualizations.

🚀 Installation

Get started with MIDAS by setting up a conda environment.

# 1. Create and activate a new conda environment
conda create -n scmidas python=3.12
conda activate scmidas

# 2. Install MIDAS from PyPI
pip install scmidas==--version

⚡ Getting Started: A Quick Example

Here is a minimal example to get you started with a mosaic integration task. For more detailed tutorials, please refer to our documentation.

from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L

# 1. Configure and initialize the MIDAS model
# The configuration file allows you to specify modalities, layers, and other parameters.
configs = load_config()

# 2. Load your mosaic dataset
# The input should be an AnnData object where modalities are stored.
# Different batches can have different combinations of modalities.
model = MIDAS.configure_data_from_dir(configs, 'path/to/your/data', transform={'atac':'binarize'})

# 3. Train the model on your data
model.train(max_epochs=2000)

# 4. Obtain the integrated and imputed results
# The model returns an AnnData object with a unified latent space 
# and imputed values for the missing modalities.
pred = model.predict()

# 5. Visualize the results
model.get_emb_umap()

⚡ Update: Load data from MuData

In addition to loading data from a directory, MIDAS also supports direct initialization from a MuData object. This is useful when your multimodal dataset is already organized in memory with modality-specific AnnData objects.

A typical MuData object may look like this:

# Example MuData:
# MuData object with n_obs × n_vars = 10000 × 1200
#   2 modalities
#     rna: 10000 x 1000
#       obs: 'batch'
#       uns: 'mask_batch1', 'mask_batch2'
#     adt: 8000 x 200
#       obs: 'batch'
#       uns: 'mask_batch1', 'mask_batch2'

You can configure the model from MuData as follows:

from scmidas.config import load_config
from scmidas.model import MIDAS
import lightning as L

# 1. Load model configuration
configs = load_config()

# 2. Prepare your MuData object
# Assume `mdata` is already loaded in memory.
# Each modality should be stored in mdata.mod, for example:
#   mdata.mod['rna']
#   mdata.mod['adt']
#
# The `batch_key` specifies the column in .obs that indicates batch membership.
# The `dims_x` argument defines the input feature dimension for each modality.
model = MIDAS.configure_data_from_mdata(
    mdata=mdata,
    batch_key='batch',
    dims_x={
        'rna': [1000],
        'adt': [200],
    },
    configs=configs
)

# 3. Train the model
model.train(max_epochs=2000)

# 4. Run prediction
pred = model.predict()

📈 Reproducibility

To reproduce the results from our publication, please visit the reproducibility branch of this repository: github.com/labomics/midas/tree/reproducibility

📜 Citation

If you use MIDAS in your research, please cite our paper:

He, Z., Hu, S., Chen, Y. et al. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-02040-y

@article{he2024mosaic,
  title={Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS},
  author={He, Zhen and Hu, Shuofeng and Chen, Yaowen and An, Sijing and Zhou, Jiahao and Liu, Runyan and Shi, Junfeng and Wang, Jing and Dong, Guohua and Shi, Jinhui and others},
  journal={Nature Biotechnology},
  pages={1--12},
  year={2024},
  publisher={Nature Publishing Group US New York}
}

🙌 Contributing

We welcome contributions from the community! If you have a suggestion, bug report, or want to contribute to the code, please feel free to open an issue or submit a pull request.

📝 License

MIDAS is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scmidas-0.1.17.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scmidas-0.1.17-py3-none-any.whl (49.1 kB view details)

Uploaded Python 3

File details

Details for the file scmidas-0.1.17.tar.gz.

File metadata

  • Download URL: scmidas-0.1.17.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.8

File hashes

Hashes for scmidas-0.1.17.tar.gz
Algorithm Hash digest
SHA256 529ddd98ad0b9819faf14f45b909ef0e191d9fcbae6f231b855d36b25faeb504
MD5 6d88c214d874b99717590bcb77a9ab51
BLAKE2b-256 89297c4b9fa3a675b6d85f6cb6b496985051797246386c6ac8808003f8b79f66

See more details on using hashes here.

File details

Details for the file scmidas-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: scmidas-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 49.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.8

File hashes

Hashes for scmidas-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 fa45559107c4804de2a846617256db61bac517f32f3ad96abaf7cd4d3820c25f
MD5 76856cf79fabba98ed30f4288235fd3a
BLAKE2b-256 1c7fe13a38ad05d6e489b62cb0ac3d6448dfbcc7b9c448fb753f96a65032313c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page