Skip to main content

A GAT-based computational framework to predict long-range gene regulatory relationships

Project description

scReGAT

A GAT-based computational framework to predict long-range gene regulatory relationships.

Installation

install pytorch

Refer to: https://pytorch.org/get-started/previous-versions/

Example code:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

install pyG

Refer to: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

Example code:

pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

install scReGAT

pip install scregat

Usage

1. Prepare scReGAT inputs

From Seuratbject

  • Suppose SeuratObj_ATAC and SeuratObj_RNA are your SeuratObjects of scATAC-seq and scRNA-seq.
  • Note that SeuratObj_ATAC must have two metadata columns named "celltype" and "celltype_rna", and SeuratObj_RNA must have a metadata column named "celltype".
library(Seurat)
library(SeuratDisk)
# save ATAC.h5ad
scregat_ATAC <- CreateSeuratObject(SeuratObj_ATAC@assays$ATAC@counts,meta.data = SeuratObj_ATAC@meta.data)
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
# save RNA.h5ad
scregat_RNA <- CreateSeuratObject(SeuratObj_RNA@assays$RNA@counts,meta.data = SeuratObj_RNA@meta.data)
scregat_RNA@assays$RNA@meta.features$name = rownames(scregat_RNA@assays$RNA@meta.features)
SaveH5Seurat(scregat_RNA, filename = 'RNA.h5Seurat',overwrite = T)
Convert(RNA.h5Seurat, dest = "h5ad",overwrite = T)

From SingleCellExperiment

  • Suppose sce is your SingleCellExperiment Object of scATAC-seq.
  • Note that sce must have two metadata columns named "celltype" and "celltype_rna".
library(Seurat)
library(Signac)
library(SingleCellExperiment)
counts <- assay(sce)
rownames(counts) <- GRangesToString(rowRanges(sce))
scregat_ATAC <- CreateSeuratObject(counts,meta.data = colData(sce))
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)

Randomly select a subset of cells

set.seed(123)
scregat_ATAC <- subset(scregat_ATAC,cells = sample(colnames(scregat_ATAC),20000))

2. Run scReGAT

Import packages

import os
import pandas as pd
import torch
import anndata as ad
from torch_geometric.loader import DataLoader
import scanpy as sc
from scregat.data_process import prepare_model_input,sum_counts,plot_edge
from scregat.model import train_scregat, explain_model_ig

Set file path

SAMPLE = 'SF11857'
path_demo = '/home/txm/txmdata/scREGION/run/NatCan_GBM/' + SAMPLE + '/run_scReGAT/'
dir_scReGAT = 'scReGAT'
atac_file = 'ATAC.h5ad'

Read in scRNA-seq data

RNA_h5ad_file = "/home/txm/txmdata/scREGION/run/NatCan_GBM/" + SAMPLE + "/run_scReGAT/RNA.h5ad"
adata_rna = sc.read_h5ad(RNA_h5ad_file)
adata_rna.obs['celltype'] = adata_rna.obs['celltype'].astype('object')
df_rna = sum_counts(adata_rna,by = 'celltype',marker_gene_num=200)

Prepare model input

dataset_atac, dataset_graph = prepare_model_input(
    path_data_root = path_demo, 
    file_atac = atac_file, 
    df_rna_celltype = df_rna,
    path_eqtl = '/home/txm/txmdata/scREGION/scReGAT/v0.0.0/data/all_tissue_SNP_Gene.txt',
    hg19tohg38 = False, min_percent = 0.01)

Train the gene expression prediction model

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_scReGAT, test_dataset = \
    train_scregat(path_demo, dataset_atac, dir_scReGAT, device,
                  hidwidth=16, numhead=8, num_epoch=20, learning_rate=1e-3,
                  split_prop = 0.6,print_process = True)

Predict regulatory intensity scores (RISs) by attribution method

  • Load well-trained model
path_model = os.path.join(path_demo, dir_scReGAT)
file_model = os.path.join(path_model,'Model_batch_size_16_hidwidth_16_numhead_8_lr_0.001_numepoch_20_split_0.6.pt')
model_scReGAT = torch.load(file_model)
  • Write predicted RIS matrix to "file_weight"
file_weight = os.path.join(path_model, "mat_RIS.tsv")
df_mat_RIS = explain_model_ig(
    dataset_atac, model_scReGAT, device, test_dataset, file_weight,print_process = True)

Visualize the cells profiled by the RIS matrix

plot_edge(df_mat_RIS, dataset_atac)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scReGAT-0.0.13-py2.py3-none-any.whl (34.9 MB view details)

Uploaded Python 2Python 3

File details

Details for the file scReGAT-0.0.13-py2.py3-none-any.whl.

File metadata

  • Download URL: scReGAT-0.0.13-py2.py3-none-any.whl
  • Upload date:
  • Size: 34.9 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for scReGAT-0.0.13-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9ac62008c32108290ec0af74c7529e0df60fc23597ec2be6ebae55c47855b998
MD5 3a3ee441a34081c42e0d8be2654b3215
BLAKE2b-256 6663e4c6e6636de65a8516bef8cf2ad8d76ae151732cafa48a2584c4aed66f3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page