Skip to main content

scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures

Project description

$scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling

The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.

The repository includes detailed installation instructions and requirements, scripts and demos.

1 Schematic overview of $scE^2TM$.

(a) To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. (b) ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$. (c) Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.

2 Installation

Create a new python environment.

conda create --name  scE2TM_env python=3.8.8
conda activate scE2TM

Install the dependencies from the provided requirements.txt file.

pip install -r requirements.txt

Installation typically completes in approximately 1.5 hours.

3 Usage

Data format

$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.

The true cell type information is only used for prediction accuracy assessment.

We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.

Training

python run.py

On the provided example dataset, the demo completes in about one minute.

Tutorial

We provide three tutorials in the tutorial directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:

  • [Clustering and Interpretable Evaluation]
  • [Pathway Enrichment]
  • [Topic gene embedding]

Reference

If you use $scE^2TM$ in your work, please cite

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sce2tm-1.0.3.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sce2tm-1.0.3-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file sce2tm-1.0.3.tar.gz.

File metadata

  • Download URL: sce2tm-1.0.3.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.3.tar.gz
Algorithm Hash digest
SHA256 033fbbb84b67c91363c475549ad77d60e8dc4a591a0c2cb51b15179d682f6af9
MD5 cd666600b908c93383e3f178b3666d95
BLAKE2b-256 f97ab9b1ad22e3c2a9a8bf334f243b31a24d544b96784e1556e125c02a2bfaa4

See more details on using hashes here.

File details

Details for the file sce2tm-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: sce2tm-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fb559b2d8fb256f92957b262d699bb611fb03a6c5a730415231cbf314b308854
MD5 cb6c7a7904dfd21f59b1d95e7e624b38
BLAKE2b-256 07c0216ef3ebc03dfd76890f1d19cdfb3c27f30da80595594f00c1f0a8901816

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page