Skip to main content

Graph-sc

Project description

Graph-sc

This repository contains the pytorch implementation of the paper "Clustering scRNA-seq data with graph neural networks", by Madalina Ciortan under the supervision of Matthieu Defrance.

We propose graph-sc, a method modeling scRNA-seq data as a graph, processed with a graph autoencoder network to create representations (embeddings) for each cell. The resulting embeddings are clustered with a general clustering algorithm (i.e. KMeans, Leiden) to produce cell class assignments. An extensive experimental study was performed on 24 simulated and 15 real-world scRNA-seq datasets. graph-sc was compared with 11 competing state-of-the-art techniques on 4 clustering scores, reflecting both the external and the internal clustering performance. The results indicate that although there is no consistently best method across all analyzed datasets, graph-sc compared favorably with the competing techniques across all types of datasets. A large ablation study evaluates numerous strategies to create the input graph, the graph autoencoder network and also the clustering phase. The proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Moreover, modeling the data as a graph provides an increased flexibility to define custom features characterizing the genes, the cells and their interactions as well as the possibility to enrich the graph with external data (i.e. gene correlations).

Installation

The package requires python >= 3.6 and can be installed by running:

pip install graph-sc

Tutorial

First import required libraries:

import h5py
import matplotlib.pyplot as plt
import numpy as np
import pkg_resources
import graph_sc.models as models
import graph_sc.train as train

device = train.get_device(use_cpu=True)
print(f"Running on device: {device}")

Then load the data to be analyzed. The package provides an example dataset:

DATA_PATH = pkg_resources.resource_filename("graph_sc", "data/")
data_mat = h5py.File(f"{DATA_PATH}/worm_neuron_cell.h5", "r")

X = np.array(data_mat["X"])

Y = np.array(data_mat["Y"]) # this is optional
n_clusters = len(np.unique(Y)) # this is required for KMeans

Run the model training:

scores = train.fit(X, Y, n_clusters, cluster_methods=["KMeans"])
print(scores)

Get the resulting embedding:

embeddings = scores["features"]

This embedding can be used as input to any other downstream task.

Plot the latent space and the underlying prediction:

pca = PCA(2).fit_transform(embeddings)
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.title("Ground truth")
plt.scatter(pca[:, 0], pca[:, 1], c=Y, s=4)

plt.subplot(122)
plt.title("K-Means pred")
plt.scatter(pca[:, 0], pca[:, 1], c=scores["kmeans_pred"], s=4)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sc-0.0.1.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

graph_sc-0.0.1-py3-none-any.whl (2.5 MB view details)

Uploaded Python 3

File details

Details for the file graph_sc-0.0.1.tar.gz.

File metadata

  • Download URL: graph_sc-0.0.1.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2abb6530dc7011787e55650ec3d14701a7317928faa737f8d7d35dc818af5ea3
MD5 3c700920ec36871153c4f7bbfca45775
BLAKE2b-256 a96024e142532fa9905a76cd71a2528b7ae68003290e9a8bc7b4641b382893b8

See more details on using hashes here.

File details

Details for the file graph_sc-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: graph_sc-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87bb6e7e24bcbf729109e1bc3e24b89839e3571944080c99c3f542807f8e552a
MD5 8d9953f58bd4580b1047b494dd5aabf0
BLAKE2b-256 42fdad0ab06be033a31b07b5b244aa48e8787bf653c17848fa15aa66e4114b7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page