Skip to main content

mosaicMPI: Mosaic Multi-resolution Program Integration

Project description

mosaicMPI logo

mosaicMPI: mosaic multi-resolution program integration

version badge PyPI Latest Release Conda Latest Release Documentation status Downloads License DOI:10.1101/2023.08.18.553919

Authors: Ted Verhey, Heewon Seo, Sorana Morrissy

mosaicMPI is a Python package enabling mosaic integration of bulk, single-cell, and spatial expression data through program-level integration. Programs are first discovered using consensus non-negative matrix factorization and then integrated using a flexible network-based approach to group similar programs together across resolutions and datasets. Program communities are then interpreted using sample/cell metadata and classical gene set analyses. Integrative program communities enable metadata transfer across datasets.

⚡Main Features

Here are just a few of the things that mosaicMPI does well:

  • Identifies interpretable, non-negative programs at multiple resolutions
  • Mosaic integration does not require subsetting features/genes to a shared or overdispersed subset
  • Multi-omics integration does not require shared sample IDs
  • Ideal for incremental integration (adding datasets one at a time) since deconvolution is performed independently on each dataset
  • Integration performs well even when the datasets have mismatched features (eg. Microarray, RNA-Seq, Proteomics) or sparsity (eg single-cell vs bulk RNA-Seq and ATAC-Seq)
  • Metadata transfer across datasets
  • Command-line interface for rapid data exploration and python interface for extensibility and flexibility

🔧 Install

✨ Latest Release

Install the package with conda (in an isolated conda environment):

conda create -n mosaicmpi -c conda-forge mosaicmpi
conda activate mosaicmpi

📖 Documentation

🗐 Data guidelines

mosaicMPI can factorize a wide variety of datasets, but will work optimally in these conditions:

  • Use untransformed, raw data data where possible, and avoid log-transformed data
  • For single-cell, spatial, or bulk RNA-Seq data, the best data to use is feature counts, then TPM-normalized values, then RPKM/FPKM-normalized values.

📓 Python interface

To get started, sample datasets and a Jupyter notebook tutorial is available here.

Detailed API reference can be found on ReadTheDocs.

⌨️ Command line interface

See the command line interface documentation.

💭 Getting Help

For errors arising during use of mosaicMPI, create and browse issues in the GitHub "issues" tab.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicmpi-1.9.1.tar.gz (64.9 kB view hashes)

Uploaded Source

Built Distribution

mosaicmpi-1.9.1-py3-none-any.whl (67.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page