Graph-sc
Project description
Graph-sc
This repository contains the pytorch implementation of the paper "Clustering scRNA-seq data with graph neural networks", by Madalina Ciortan under the supervision of Matthieu Defrance.
We propose graph-sc, a method modeling scRNA-seq data as a graph, processed with a graph autoencoder network to create representations (embeddings) for each cell. The resulting embeddings are clustered with a general clustering algorithm (i.e. KMeans, Leiden) to produce cell class assignments. An extensive experimental study was performed on 24 simulated and 15 real-world scRNA-seq datasets. graph-sc was compared with 11 competing state-of-the-art techniques on 4 clustering scores, reflecting both the external and the internal clustering performance. The results indicate that although there is no consistently best method across all analyzed datasets, graph-sc compared favorably with the competing techniques across all types of datasets. A large ablation study evaluates numerous strategies to create the input graph, the graph autoencoder network and also the clustering phase. The proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Moreover, modeling the data as a graph provides an increased flexibility to define custom features characterizing the genes, the cells and their interactions as well as the possibility to enrich the graph with external data (i.e. gene correlations).
Installation
The package requires python >= 3.6 and can be installed by running:
pip install graph-sc
Tutorial
First import required libraries:
import h5py
import matplotlib.pyplot as plt
import numpy as np
import pkg_resources
import graph_sc.models as models
import graph_sc.train as train
device = train.get_device(use_cpu=True)
print(f"Running on device: {device}")
Then load the data to be analyzed. The package provides an example dataset:
DATA_PATH = pkg_resources.resource_filename("graph_sc", "data/")
data_mat = h5py.File(f"{DATA_PATH}/worm_neuron_cell.h5", "r")
X = np.array(data_mat["X"])
Y = np.array(data_mat["Y"]) # this is optional
n_clusters = len(np.unique(Y)) # this is required for KMeans
Run the model training:
scores = train.fit(X, Y, n_clusters, cluster_methods=["KMeans"])
print(scores)
Get the resulting embedding:
embeddings = scores["features"]
This embedding can be used as input to any other downstream task.
Plot the latent space and the underlying prediction:
pca = PCA(2).fit_transform(embeddings)
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.title("Ground truth")
plt.scatter(pca[:, 0], pca[:, 1], c=Y, s=4)
plt.subplot(122)
plt.title("K-Means pred")
plt.scatter(pca[:, 0], pca[:, 1], c=scores["kmeans_pred"], s=4)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file graph_sc-0.0.2.tar.gz
.
File metadata
- Download URL: graph_sc-0.0.2.tar.gz
- Upload date:
- Size: 2.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdb45b288b239fd6057ad0af0875d8887540211bb6a38cbdf5de1549831e5b24 |
|
MD5 | 3d6a829ecbb3c7386f07d44d223322cc |
|
BLAKE2b-256 | 8ef47d7e72734d328495f626a4bc1d7ed2203fb8d5fd8896b2d2a4857c530618 |
File details
Details for the file graph_sc-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: graph_sc-0.0.2-py3-none-any.whl
- Upload date:
- Size: 2.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fb2dd3763250f41c717891ea175187c40fa68bd50796c76789a7e37fda675e8 |
|
MD5 | 54993244b653c2eee9fadd94d719ff2b |
|
BLAKE2b-256 | 3f9d73839df41e1b84bbb2983f6ba715b474e5e05879e80e0f02396ab6acf14b |