Skip to main content

A GAT-based computational framework to predict long-range gene regulatory relationships

Project description

scReGAT

A GAT-based computational framework to predict long-range gene regulatory relationships.

Installation

install pytorch

Refer to: https://pytorch.org/get-started/previous-versions/

Example code:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

install pyG

Refer to: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

Example code:

pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

install scReGAT

pip install scregat

Usage

1. Prepare scReGAT inputs

From Seuratbject

  • Suppose SeuratObj_ATAC and SeuratObj_RNA are your SeuratObjects of scATAC-seq and scRNA-seq.
  • Note that SeuratObj_ATAC must have two metadata columns named "celltype" and "celltype_rna", and SeuratObj_RNA must have a metadata column named "celltype".
library(Seurat)
library(SeuratDisk)
# save ATAC.h5ad
scregat_ATAC <- CreateSeuratObject(SeuratObj_ATAC@assays$ATAC@counts,meta.data = SeuratObj_ATAC@meta.data)
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
# save RNA.h5ad
scregat_RNA <- CreateSeuratObject(SeuratObj_RNA@assays$RNA@counts,meta.data = SeuratObj_RNA@meta.data)
scregat_RNA@assays$RNA@meta.features$name = rownames(scregat_RNA@assays$RNA@meta.features)
SaveH5Seurat(scregat_RNA, filename = 'RNA.h5Seurat',overwrite = T)
Convert(RNA.h5Seurat, dest = "h5ad",overwrite = T)

From SingleCellExperiment

  • Suppose sce is your SingleCellExperiment Object of scATAC-seq.
  • Note that sce must have two metadata columns named "celltype" and "celltype_rna".
library(Seurat)
library(Signac)
library(SingleCellExperiment)
counts <- assay(sce)
rownames(counts) <- GRangesToString(rowRanges(sce))
scregat_ATAC <- CreateSeuratObject(counts,meta.data = colData(sce))
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)

Randomly select a subset of cells

set.seed(123)
scregat_ATAC <- subset(scregat_ATAC,cells = sample(colnames(scregat_ATAC),20000))

2. Run scReGAT

Import packages

import os
import pandas as pd
import torch
import anndata as ad
from torch_geometric.loader import DataLoader
import scanpy as sc
from scregat.data_process import prepare_model_input,sum_counts,plot_edge
from scregat.model import train_scregat, explain_model_ig

Set file path

SAMPLE = 'SF11857'
path_demo = '/home/txm/txmdata/scREGION/run/NatCan_GBM/' + SAMPLE + '/run_scReGAT/'
dir_scReGAT = 'scReGAT'
atac_file = 'ATAC.h5ad'

Read in scRNA-seq data

RNA_h5ad_file = "/home/txm/txmdata/scREGION/run/NatCan_GBM/" + SAMPLE + "/run_scReGAT/RNA.h5ad"
adata_rna = sc.read_h5ad(RNA_h5ad_file)
adata_rna.obs['celltype'] = adata_rna.obs['celltype'].astype('object')
df_rna = sum_counts(adata_rna,by = 'celltype',marker_gene_num=200)

Prepare model input

dataset_atac, dataset_graph = prepare_model_input(
    path_data_root = path_demo, 
    file_atac = atac_file, 
    df_rna_celltype = df_rna,
    path_eqtl = '/home/txm/txmdata/scREGION/scReGAT/v0.0.0/data/all_tissue_SNP_Gene.txt',
    hg19tohg38 = False, min_percent = 0.01)

Train the gene expression prediction model

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_scReGAT, test_dataset = \
    train_scregat(path_demo, dataset_atac, dir_scReGAT, device,
                  hidwidth=16, numhead=8, num_epoch=20, learning_rate=1e-3,
                  split_prop = 0.6,print_process = True)

Predict regulatory intensity scores (RISs) by attribution method

  • Load well-trained model
path_model = os.path.join(path_demo, dir_scReGAT)
file_model = os.path.join(path_model,'Model_batch_size_16_hidwidth_16_numhead_8_lr_0.001_numepoch_20_split_0.6.pt')
model_scReGAT = torch.load(file_model)
  • Write predicted RIS matrix to "file_weight"
file_weight = os.path.join(path_model, "mat_RIS.tsv")
df_mat_RIS = explain_model_ig(
    dataset_atac, model_scReGAT, device, test_dataset, file_weight,print_process = True)

Visualize the cells profiled by the RIS matrix

plot_edge(df_mat_RIS, dataset_atac)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scReGAT-0.0.13-py2.py3-none-any.whl (34.9 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page