A GAT-based computational framework to predict long-range gene regulatory relationships
Project description
scReGAT
A GAT-based computational framework to predict long-range gene regulatory relationships.
Installation
install pytorch
Refer to: https://pytorch.org/get-started/previous-versions/
Example code:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
install pyG
Refer to: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
Example code:
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
install scReGAT
pip install scregat
Usage
1. Prepare scReGAT inputs
From Seuratbject
- Suppose
SeuratObj_ATAC
andSeuratObj_RNA
are your SeuratObjects of scATAC-seq and scRNA-seq. - Note that
SeuratObj_ATAC
must have two metadata columns named "celltype" and "celltype_rna", andSeuratObj_RNA
must have a metadata column named "celltype".
library(Seurat)
library(SeuratDisk)
# save ATAC.h5ad
scregat_ATAC <- CreateSeuratObject(SeuratObj_ATAC@assays$ATAC@counts,meta.data = SeuratObj_ATAC@meta.data)
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
# save RNA.h5ad
scregat_RNA <- CreateSeuratObject(SeuratObj_RNA@assays$RNA@counts,meta.data = SeuratObj_RNA@meta.data)
scregat_RNA@assays$RNA@meta.features$name = rownames(scregat_RNA@assays$RNA@meta.features)
SaveH5Seurat(scregat_RNA, filename = 'RNA.h5Seurat',overwrite = T)
Convert(RNA.h5Seurat, dest = "h5ad",overwrite = T)
From SingleCellExperiment
- Suppose
sce
is your SingleCellExperiment Object of scATAC-seq. - Note that
sce
must have two metadata columns named "celltype" and "celltype_rna".
library(Seurat)
library(Signac)
library(SingleCellExperiment)
counts <- assay(sce)
rownames(counts) <- GRangesToString(rowRanges(sce))
scregat_ATAC <- CreateSeuratObject(counts,meta.data = colData(sce))
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
Randomly select a subset of cells
set.seed(123)
scregat_ATAC <- subset(scregat_ATAC,cells = sample(colnames(scregat_ATAC),20000))
2. Run scReGAT
Import packages
import os
import pandas as pd
import torch
import anndata as ad
from torch_geometric.loader import DataLoader
import scanpy as sc
from scregat.data_process import prepare_model_input,sum_counts,plot_edge
from scregat.model import train_scregat, explain_model_ig
Set file path
SAMPLE = 'SF11857'
path_demo = '/home/txm/txmdata/scREGION/run/NatCan_GBM/' + SAMPLE + '/run_scReGAT/'
dir_scReGAT = 'scReGAT'
atac_file = 'ATAC.h5ad'
Read in scRNA-seq data
RNA_h5ad_file = "/home/txm/txmdata/scREGION/run/NatCan_GBM/" + SAMPLE + "/run_scReGAT/RNA.h5ad"
adata_rna = sc.read_h5ad(RNA_h5ad_file)
adata_rna.obs['celltype'] = adata_rna.obs['celltype'].astype('object')
df_rna = sum_counts(adata_rna,by = 'celltype',marker_gene_num=200)
Prepare model input
dataset_atac, dataset_graph = prepare_model_input(
path_data_root = path_demo,
file_atac = atac_file,
df_rna_celltype = df_rna,
path_eqtl = '/home/txm/txmdata/scREGION/scReGAT/v0.0.0/data/all_tissue_SNP_Gene.txt',
hg19tohg38 = False, min_percent = 0.01)
Train the gene expression prediction model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_scReGAT, test_dataset = \
train_scregat(path_demo, dataset_atac, dir_scReGAT, device,
hidwidth=16, numhead=8, num_epoch=20, learning_rate=1e-3,
split_prop = 0.6,print_process = True)
Predict regulatory intensity scores (RISs) by attribution method
- Load well-trained model
path_model = os.path.join(path_demo, dir_scReGAT)
file_model = os.path.join(path_model,'Model_batch_size_16_hidwidth_16_numhead_8_lr_0.001_numepoch_20_split_0.6.pt')
model_scReGAT = torch.load(file_model)
- Write predicted RIS matrix to "file_weight"
file_weight = os.path.join(path_model, "mat_RIS.tsv")
df_mat_RIS = explain_model_ig(
dataset_atac, model_scReGAT, device, test_dataset, file_weight,print_process = True)
Visualize the cells profiled by the RIS matrix
plot_edge(df_mat_RIS, dataset_atac)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for scReGAT-0.0.13-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ac62008c32108290ec0af74c7529e0df60fc23597ec2be6ebae55c47855b998 |
|
MD5 | 3a3ee441a34081c42e0d8be2654b3215 |
|
BLAKE2b-256 | 6663e4c6e6636de65a8516bef8cf2ad8d76ae151732cafa48a2584c4aed66f3d |