scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures
Project description
$scE^2TM$: Toward Interpretable Single-Cell Embedding via Topic Modeling
The full description of $scE^2TM$ and its application on published single cell RNA-seq datasets are available.
The repository includes detailed installation instructions and requirements, scripts and demos.
1 Schematic overview of $scE^2TM$.
(a) To better collaborate the information of different modalities, clusters and topic heads are trained based on mutually refined neighborhood information by encouraging consistent clustering assignments of mutual nearest neighbors of the corresponding cells of different modalities in the embedding space. (b) ECR clusters gene embeddings $g_j$ (•) as samples and topic embeddings $t_k$ (★) as centers with soft assignment $\pi^{*}_{\epsilon,jk}$. Here, ECR pushes $g_1$ and $g_2$ close to $t_1$, and away from $t_3$ and $t_5$. (c) Sparse linear decoders learn topic embeddings and gene embeddings as well as sparse topic-gene dependencies during reconstruction, thus ensuring model interpretability.
2 Installation
Create a new python environment.
conda create --name scE2TM_env python=3.8.8
conda activate scE2TM
Install the dependencies from the provided requirements.txt file.
pip install -r requirements.txt
Installation typically completes in approximately 1.5 hours.
3 Usage
Data format
$scE^2TM$ requires the input of cell-by-cell gene matrices, external embedding of cells, and true cell type information in .CSV object format.
The true cell type information is only used for prediction accuracy assessment.
We provide default data (Wang) for users to understand and debug the $scE^2TM$ code.
Training
python run.py
On the provided example dataset, the demo completes in about one minute.
Tutorial
We provide three tutorials in the tutorial directory that introduce the usage of $scE^2TM$ and reproduce the main quantitative results:
- [Clustering and Interpretable Evaluation]
- [Pathway Enrichment]
- [Topic gene embedding]
Reference
If you use $scE^2TM$ in your work, please cite
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sce2tm-1.0.4.tar.gz.
File metadata
- Download URL: sce2tm-1.0.4.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d0761ad8e8881e494b82627e24d7e4b6416ce785715c8e23b563a58e6cec54d
|
|
| MD5 |
fd13df7221cddba6bfe69e4441f020ee
|
|
| BLAKE2b-256 |
f3344748ef7841404eb5e73e1abd4bd1620e0cbd3d5ebc3b6753c8014fd0361c
|
File details
Details for the file sce2tm-1.0.4-py3-none-any.whl.
File metadata
- Download URL: sce2tm-1.0.4-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e0930a1fe19074d7840c996e4c1eefaa17db47d855154a2a7a01559be7a041a
|
|
| MD5 |
502c56995e7928a943d576f7de69509f
|
|
| BLAKE2b-256 |
45c6755a1f434a58eced60461f4d3d15f77e33cd97aa7042d88540a4689d8c0c
|