Topological Identification and Interpretation for High-throughput Single-cell Gene Regulation Elucidation across Multiple Platforms using scMGCA
Project description
scMGCA
Single-cell RNA sequencing (scRNA-seq) provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from the curse of dimensionality, and the prevalence of dropout events. To address these concerns, we developed a single-cell clustering method (scMGCA) based on a graph-embedding autoencoder that simultaneously learns cell–cell topology representation and cluster assignments. In scMGCA, we propose a graph convolutional autoencoder to preserve the topological information of cells from the embedded space in multinomial distribution, and employs the positive pointwise mutual information (PPMI) matrix for cell graph augmentation. Experiments show that scMGCA is accurate and effective for cell segregation and superior to other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. In a pancreatic ductal adenocarcinoma (PDAC) dataset, with 8921 individual pancreatic cells from primary PDAC tumors and control pancreases, scMGCA successfully provided annotations on the specific cell types and revealed differential gene expression levels across multiple tumor-associated and cell signalling pathways in PDAC progression through single-cell trajectory and gene set enrichment analysis.
Installation
pip
$ pip install -r requirements
$ pip install scMGCA
requirements in https://github.com/Philyzh8/scMGCA
Usage
You can run the scTAG from the command line:
$ from scMGCA.run import train
$ train(data,highly_genes=500,pretrain_epochs=1000,maxiter=300)
Example
$ from scMGCA.run import train
$ data = './dataset/Quake_10x_Limb_Muscle/data.h5'
$ train(data,highly_genes=500,pretrain_epochs=1000,maxiter=300)
The result will give NMI and ARI
Arguments
Parameter | Introduction |
---|---|
data | A h5 file. Contains a matrix of scRNA-seq expression values,true labels, and other information. By default, genes are assumed to be represented by columns and samples are assumed to be represented by rows. |
highly genes | Number of genes selected |
pretrain epochs | Number of pretrain epochs |
maxiter | Number of training epochs |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scMGCA-1.0.4.tar.gz
.
File metadata
- Download URL: scMGCA-1.0.4.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.60.0 importlib-metadata/3.10.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65e6e015e7fa0896c9806075e7ef6c26b5814d262c3d72ed40da6c024811494c |
|
MD5 | 9152be99bd7742b507b7563c269b1c2d |
|
BLAKE2b-256 | 0cabd6986ba05f14581dea48980f18a06356ff112b55e04c598cfa6cd36bacd4 |
File details
Details for the file scMGCA-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: scMGCA-1.0.4-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.60.0 importlib-metadata/3.10.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78d23b3bbc705900fa8b892519bec3f1cfd24f9b0126ff315aff19eca866e046 |
|
MD5 | 47531bd88eae5b9f2fa546a09ebbe3fd |
|
BLAKE2b-256 | ee612194b22bc693d3fd492a6d9d4154c6884263cca319deae56265aa504c417 |