Skip to main content

Graph-sc

Project description

Graph-sc

This repository contains the pytorch implementation of the paper "Clustering scRNA-seq data with graph neural networks", by Madalina Ciortan under the supervision of Matthieu Defrance.

We propose graph-sc, a method modeling scRNA-seq data as a graph, processed with a graph autoencoder network to create representations (embeddings) for each cell. The resulting embeddings are clustered with a general clustering algorithm (i.e. KMeans, Leiden) to produce cell class assignments. An extensive experimental study was performed on 24 simulated and 15 real-world scRNA-seq datasets. graph-sc was compared with 11 competing state-of-the-art techniques on 4 clustering scores, reflecting both the external and the internal clustering performance. The results indicate that although there is no consistently best method across all analyzed datasets, graph-sc compared favorably with the competing techniques across all types of datasets. A large ablation study evaluates numerous strategies to create the input graph, the graph autoencoder network and also the clustering phase. The proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Moreover, modeling the data as a graph provides an increased flexibility to define custom features characterizing the genes, the cells and their interactions as well as the possibility to enrich the graph with external data (i.e. gene correlations).

Installation

The package requires python >= 3.6 and can be installed by running:

pip install graph-sc

Tutorial

First import required libraries:

import h5py
import matplotlib.pyplot as plt
import numpy as np
import pkg_resources
import graph_sc.models as models
import graph_sc.train as train

device = train.get_device(use_cpu=True)
print(f"Running on device: {device}")

Then load the data to be analyzed. The package provides an example dataset:

DATA_PATH = pkg_resources.resource_filename("graph_sc", "data/")
data_mat = h5py.File(f"{DATA_PATH}/worm_neuron_cell.h5", "r")

X = np.array(data_mat["X"])

Y = np.array(data_mat["Y"]) # this is optional
n_clusters = len(np.unique(Y)) # this is required for KMeans

Run the model training:

scores = train.fit(X, Y, n_clusters, cluster_methods=["KMeans"])
print(scores)

Get the resulting embedding:

embeddings = scores["features"]

This embedding can be used as input to any other downstream task.

Plot the latent space and the underlying prediction:

pca = PCA(2).fit_transform(embeddings)
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.title("Ground truth")
plt.scatter(pca[:, 0], pca[:, 1], c=Y, s=4)

plt.subplot(122)
plt.title("K-Means pred")
plt.scatter(pca[:, 0], pca[:, 1], c=scores["kmeans_pred"], s=4)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sc-0.0.2.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

graph_sc-0.0.2-py3-none-any.whl (2.5 MB view details)

Uploaded Python 3

File details

Details for the file graph_sc-0.0.2.tar.gz.

File metadata

  • Download URL: graph_sc-0.0.2.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.2.tar.gz
Algorithm Hash digest
SHA256 cdb45b288b239fd6057ad0af0875d8887540211bb6a38cbdf5de1549831e5b24
MD5 3d6a829ecbb3c7386f07d44d223322cc
BLAKE2b-256 8ef47d7e72734d328495f626a4bc1d7ed2203fb8d5fd8896b2d2a4857c530618

See more details on using hashes here.

File details

Details for the file graph_sc-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: graph_sc-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.7.11

File hashes

Hashes for graph_sc-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4fb2dd3763250f41c717891ea175187c40fa68bd50796c76789a7e37fda675e8
MD5 54993244b653c2eee9fadd94d719ff2b
BLAKE2b-256 3f9d73839df41e1b84bbb2983f6ba715b474e5e05879e80e0f02396ab6acf14b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page