Skip to main content

A tri-partite modular autoencoder for addressing imbalanced cell type annotation and batch effect reduction

Project description

scMusketeers : A tri-partite modular autoencoder for addressing imbalanced cell type annotation and batch effect reduction

scMusketeers

PyPI install with bioconda install with Docker install with Singularity

Static Badge Documentation Status Static Badge Static Badge

Summary

We developed scMusketeer, a modular deep learning model producing an optimal dimension-reduced representation with a focus on imbalanced cell type annotation and batch effect reduction. The architecture of scMusketeers is made of three modules. The first module is an autoencoder which provides a reduced latent representation, while removing noise, thus resulting in a better data reconstruction. The second module, is a classifier with a focal loss providing higher prediction for smaller populations of cell types. The third module is an adversarial domain adaptation (DANN) module that corrects batch effect.

scMusketeers performance was optimized after conducting a precise ablation study to assess model's hyperparameters. The model was compared to reference tools for single-cell integration and annotation. It was at least on par with state-of-the-art models, often outperforming most of them. It showed increased performance on the identification of rare cell types. Despite the rather simple structure of its deep learning model, it demonstrated equivalent performance to UCE foundation model. Finally, scMusketeers was able to transfer the cell label from single-cell RNA-Seq to spatial transcriptomics.

Our tripartite modular autoencoder demonstrates versatile capabilities while addressing key challenges in single-cell atlas reconstruction. We noticed in particular that the generic modular framework of scMusketeers should be easily generalized to other large-scale biology projects that require deep learning models.

Publication

scMusketeers: Addressing imbalanced cell type annotation and batch effect reduction with a modular autoencoder. Collin A., Pelletier S., Fierville M., Droit A., Precioso F., Bécavin C., Barbry P. BioRXiv

Tutorial

Access to the tutorial on Google collab

We will see in this tutorial two use-cases:

  • Transfer cell annotation to unlabeled cells
  • Transfer cell annotation and reduce batch from a query atlas to a reference atlas

Install

sc-musketeers needs a TensorFlow backend, and you choose which build to install through an extra:

  • GPU (GPU-capable TensorFlow):

    $ pip install "sc-musketeers[gpu]"
    
  • CPU only (smaller, no CUDA):

    $ pip install "sc-musketeers[cpu]"
    

Note: pick exactly one extra. sc-musketeers[gpu] pulls in tensorflow while sc-musketeers[cpu] pulls in tensorflow-cpu; both provide the same import tensorflow, so installing only one avoids a conflict. A bare pip install sc-musketeers (no extra) installs no TensorFlow and the CLI will fail to import — always specify [gpu] or [cpu].

with conda

$ conda -c bioconda sc-musketeers

with docker

$ docker pull quay.io/biocontainers/sc-musketeers:0.4.1--pyhdfd78af_0

with singularity

$ singularity run https://depot.galaxyproject.org/singularity/sc-musketeers:0.3.7--pyhdfd78af_0

Examples

sc-musketeers can be used for different task in integration and annotation of single-cell atlas.

Here are 2 different examples:

  • Transfer cell annotation to unlabeled cells
$ sc-musketeers transfer my_atlas --class_key celltype --batch_key donor --unlabeled_category=Unknown
  • Transfer cell annotation and reduce batch from a query atlas to a reference atlas
$ sc-musketeers transfer ref_dataset --query_path query_dataset --class_key=celltype --batch_key donor --unlabeled_category=Unknown

Read the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sc_musketeers-0.7.0.tar.gz (93.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sc_musketeers-0.7.0-py3-none-any.whl (102.1 kB view details)

Uploaded Python 3

File details

Details for the file sc_musketeers-0.7.0.tar.gz.

File metadata

  • Download URL: sc_musketeers-0.7.0.tar.gz
  • Upload date:
  • Size: 93.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sc_musketeers-0.7.0.tar.gz
Algorithm Hash digest
SHA256 e33793e706b894453abb518ff918ce61904b26efec420a72f1a628278d0a39b3
MD5 cb0647aa9096ea71303e6fa706502add
BLAKE2b-256 b1e7cfe28d3a7efc4d84c6bed35f081a5865afb375987ed88b752f2bedfc4a41

See more details on using hashes here.

File details

Details for the file sc_musketeers-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: sc_musketeers-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 102.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sc_musketeers-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ee2fbbf418432992bb950b147d82ba77257023ff973b79e409416641ee355d4
MD5 cbcbc390e3a612453828754d2f119fed
BLAKE2b-256 29372b9ea94aefaec83d393868a7252914bf76383dffd4194c5fa27a4899b98e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page