Skip to main content

scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures

Project description

$scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling

The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.

The repository includes detailed installation instructions and requirements, scripts and demos.

1 Schematic overview of $scE^2TM$.

(a) To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. (b) ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$. (c) Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.

2 Installation

Create a new python environment.

conda create --name  scE2TM_env python=3.8.8
conda activate scE2TM

Install the dependencies from the provided requirements.txt file.

pip install -r requirements.txt

Installation typically completes in approximately 1.5 hours.

3 Usage

Data format

$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.

The true cell type information is only used for prediction accuracy assessment.

We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.

Training

python run.py

On the provided example dataset, the demo completes in about one minute.

Tutorial

We provide three tutorials in the tutorial directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:

  • [Clustering and Interpretable Evaluation]
  • [Pathway Enrichment]
  • [Topic gene embedding]

Reference

If you use $scE^2TM$ in your work, please cite

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sce2tm-1.0.4.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sce2tm-1.0.4-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file sce2tm-1.0.4.tar.gz.

File metadata

  • Download URL: sce2tm-1.0.4.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.4.tar.gz
Algorithm Hash digest
SHA256 7d0761ad8e8881e494b82627e24d7e4b6416ce785715c8e23b563a58e6cec54d
MD5 fd13df7221cddba6bfe69e4441f020ee
BLAKE2b-256 f3344748ef7841404eb5e73e1abd4bd1620e0cbd3d5ebc3b6753c8014fd0361c

See more details on using hashes here.

File details

Details for the file sce2tm-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: sce2tm-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9e0930a1fe19074d7840c996e4c1eefaa17db47d855154a2a7a01559be7a041a
MD5 502c56995e7928a943d576f7de69509f
BLAKE2b-256 45c6755a1f434a58eced60461f4d3d15f77e33cd97aa7042d88540a4689d8c0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page