Skip to main content

scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures

Project description

$scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling

The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.

The repository includes detailed installation instructions and requirements, scripts and demos.

1 Schematic overview of $scE^2TM$.

(a) To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. (b) ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$. (c) Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.

2 Installation

Create a new python environment.

conda create --name  scE2TM_env python=3.8.8
conda activate scE2TM

Install the dependencies from the provided requirements.txt file.

pip install -r requirements.txt

Installation typically completes in approximately 1.5 hours.

3 Usage

Data format

$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.

The true cell type information is only used for prediction accuracy assessment.

We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.

Training

python run.py

On the provided example dataset, the demo completes in about one minute.

Tutorial

We provide three tutorials in the tutorial directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:

  • [Clustering and Interpretable Evaluation]
  • [Pathway Enrichment]
  • [Topic gene embedding]

Reference

If you use $scE^2TM$ in your work, please cite

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sce2tm-1.0.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sce2tm-1.0.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file sce2tm-1.0.0.tar.gz.

File metadata

  • Download URL: sce2tm-1.0.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 af30f8ce91ecdf3c8ab5544bfce479d6d8a45269eb12887fc4fc350f94a0103a
MD5 64d4d1348cc0912394cb7c88d4f2fcfe
BLAKE2b-256 b4640cedc37d79c8479485563498d4bb9968a9f43c291619983e4dcd471a60cb

See more details on using hashes here.

File details

Details for the file sce2tm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sce2tm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for sce2tm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73b8bdfdd188ae0d95c34a902a40a763fa5509230e3c8ca1310935b431a90ecd
MD5 481de2bcb460040813958ea0474e1629
BLAKE2b-256 871baaf6ec3c5575749ee69c47c5215f1755f75628237ad5d907438eb19a3791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page