Skip to main content

scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures

Project description

$scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling

The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.

The repository includes detailed installation instructions and requirements, scripts and demos.

1 Schematic overview of $scE^2TM$.

(a) To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. (b) ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$. (c) Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.

2 Installation

Create a new python environment.

conda create --name  scE2TM_env python=3.8.8
conda activate scE2TM

Install the dependencies from the provided requirements.txt file.

pip install -r requirements.txt

Installation typically completes in approximately 1.5 hours.

3 Usage

Data format

$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.

The true cell type information is only used for prediction accuracy assessment.

We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.

Training

python run.py

On the provided example dataset, the demo completes in about one minute.

Tutorial

We provide three tutorials in the tutorial directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:

  • [Clustering and Interpretable Evaluation]
  • [Pathway Enrichment]
  • [Topic gene embedding]

Reference

If you use $scE^2TM$ in your work, please cite

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sce2tm-1.0.2.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sce2tm-1.0.2-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file sce2tm-1.0.2.tar.gz.

File metadata

  • Download URL: sce2tm-1.0.2.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.2.tar.gz
Algorithm Hash digest
SHA256 fb3614a27faa80a975bb81e7ee478d8ad2b73a23669c0ab30f4b7f9e7fdf951f
MD5 9838143d5d13a665e6df61d9730fa5b4
BLAKE2b-256 87e39c67e0c1f544cff4ed6f9fcfb9df4637fb3a2aad23adb27cb2f5e8cd36d6

See more details on using hashes here.

File details

Details for the file sce2tm-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: sce2tm-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b77d6930fa4af26fe94afbcedbe2103b08c20d4f400ed69e9d9ea667f59a9e83
MD5 5dcac6d0c1b4320991813e011c0c51f8
BLAKE2b-256 2682f6f3085edfefc38e308d82e13d5638e414efc29bd3370c803b495224ae14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page