Skip to main content

Graph-sc

Project description

Graph-sc

This repository contains the pytorch implementation of the paper "Clustering scRNA-seq data with graph neural networks", by Madalina Ciortan under the supervision of Matthieu Defrance.

We propose graph-sc, a method modeling scRNA-seq data as a graph, processed with a graph autoencoder network to create representations (embeddings) for each cell. The resulting embeddings are clustered with a general clustering algorithm (i.e. KMeans, Leiden) to produce cell class assignments. An extensive experimental study was performed on 24 simulated and 15 real-world scRNA-seq datasets. graph-sc was compared with 11 competing state-of-the-art techniques on 4 clustering scores, reflecting both the external and the internal clustering performance. The results indicate that although there is no consistently best method across all analyzed datasets, graph-sc compared favorably with the competing techniques across all types of datasets. A large ablation study evaluates numerous strategies to create the input graph, the graph autoencoder network and also the clustering phase. The proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Moreover, modeling the data as a graph provides an increased flexibility to define custom features characterizing the genes, the cells and their interactions as well as the possibility to enrich the graph with external data (i.e. gene correlations).

Installation

The package requires python >= 3.6 and can be installed by running:

pip install graph-sc

Tutorial

First import required libraries:

import h5py
import matplotlib.pyplot as plt
import numpy as np
import pkg_resources
import graph_sc.models as models
import graph_sc.train as train

device = train.get_device(use_cpu=True)
print(f"Running on device: {device}")

Then load the data to be analyzed. The package provides an example dataset:

DATA_PATH = pkg_resources.resource_filename("graph_sc", "data/")
data_mat = h5py.File(f"{DATA_PATH}/worm_neuron_cell.h5", "r")

X = np.array(data_mat["X"])

Y = np.array(data_mat["Y"]) # this is optional
n_clusters = len(np.unique(Y)) # this is required for KMeans

Run the model training:

scores = train.fit(X, Y, n_clusters, cluster_methods=["KMeans"])
print(scores)

Get the resulting embedding:

embeddings = scores["features"]

This embedding can be used as input to any other downstream task.

Plot the latent space and the underlying prediction:

pca = PCA(2).fit_transform(embeddings)
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.title("Ground truth")
plt.scatter(pca[:, 0], pca[:, 1], c=Y, s=4)

plt.subplot(122)
plt.title("K-Means pred")
plt.scatter(pca[:, 0], pca[:, 1], c=scores["kmeans_pred"], s=4)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sc-0.0.3.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

graph_sc-0.0.3-py3-none-any.whl (2.5 MB view details)

Uploaded Python 3

File details

Details for the file graph_sc-0.0.3.tar.gz.

File metadata

  • Download URL: graph_sc-0.0.3.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.3.tar.gz
Algorithm Hash digest
SHA256 90837f68064f9c2ed3a3a2a9f16d0000233cfccf344aa070fff3aae54b648235
MD5 ff9156a82736999daf8adea3f9253e50
BLAKE2b-256 c86f449c221a5fa05fe5385409219762a4c6a2415e0a5e7d65ea1d275d4d0ef9

See more details on using hashes here.

File details

Details for the file graph_sc-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: graph_sc-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 003c8ee07d67a505f22700c27e6c8675a54333a25bddd73ec6fecdbfcc7f1377
MD5 604c9b019b36ed34a5a5e0a33b3c0353
BLAKE2b-256 a8a1f12fdd2d120f748f0ad1c88ea2a142e9dbaaa665b191c045c1ccf0736f26

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page