A GAT-based computational framework to predict long-range gene regulatory relationships
Project description
scReGAT
A GAT-based computational framework to predict long-range gene regulatory relationships.
Installation
install pytorch
Refer to: https://pytorch.org/get-started/previous-versions/
Example code:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
install pyG
Refer to: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
Example code:
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
install scReGAT
pip install scregat
Usage
1. Prepare scReGAT inputs
From Seuratbject
- Suppose
SeuratObj_ATACandSeuratObj_RNAare your SeuratObjects of scATAC-seq and scRNA-seq. - Note that
SeuratObj_ATACmust have two metadata columns named "celltype" and "celltype_rna", andSeuratObj_RNAmust have a metadata column named "celltype".
library(Seurat)
library(SeuratDisk)
# save ATAC.h5ad
scregat_ATAC <- CreateSeuratObject(SeuratObj_ATAC@assays$ATAC@counts,meta.data = SeuratObj_ATAC@meta.data)
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
# save RNA.h5ad
scregat_RNA <- CreateSeuratObject(SeuratObj_RNA@assays$RNA@counts,meta.data = SeuratObj_RNA@meta.data)
scregat_RNA@assays$RNA@meta.features$name = rownames(scregat_RNA@assays$RNA@meta.features)
SaveH5Seurat(scregat_RNA, filename = 'RNA.h5Seurat',overwrite = T)
Convert(RNA.h5Seurat, dest = "h5ad",overwrite = T)
From SingleCellExperiment
- Suppose
sceis your SingleCellExperiment Object of scATAC-seq. - Note that
scemust have two metadata columns named "celltype" and "celltype_rna".
library(Seurat)
library(Signac)
library(SingleCellExperiment)
counts <- assay(sce)
rownames(counts) <- GRangesToString(rowRanges(sce))
scregat_ATAC <- CreateSeuratObject(counts,meta.data = colData(sce))
scregat_ATAC@assays$RNA@meta.features$name <- rownames(scregat_ATAC@assays$RNA@meta.features)
SaveH5Seurat(scregat_ATAC, filename = 'ATAC.h5Seurat',overwrite = T)
Convert(ATAC.h5Seurat, dest = "h5ad",overwrite = T)
Randomly select a subset of cells
set.seed(123)
scregat_ATAC <- subset(scregat_ATAC,cells = sample(colnames(scregat_ATAC),20000))
2. Run scReGAT
Import packages
import os
import pandas as pd
import torch
import anndata as ad
from torch_geometric.loader import DataLoader
import scanpy as sc
from scregat.data_process import prepare_model_input,sum_counts,plot_edge
from scregat.model import train_scregat, explain_model_ig
Set file path
SAMPLE = 'SF11857'
path_demo = '/home/txm/txmdata/scREGION/run/NatCan_GBM/' + SAMPLE + '/run_scReGAT/'
dir_scReGAT = 'scReGAT'
atac_file = 'ATAC.h5ad'
Read in scRNA-seq data
RNA_h5ad_file = "/home/txm/txmdata/scREGION/run/NatCan_GBM/" + SAMPLE + "/run_scReGAT/RNA.h5ad"
adata_rna = sc.read_h5ad(RNA_h5ad_file)
adata_rna.obs['celltype'] = adata_rna.obs['celltype'].astype('object')
df_rna = sum_counts(adata_rna,by = 'celltype',marker_gene_num=200)
Prepare model input
dataset_atac, dataset_graph = prepare_model_input(
path_data_root = path_demo,
file_atac = atac_file,
df_rna_celltype = df_rna,
path_eqtl = '/home/txm/txmdata/scREGION/scReGAT/v0.0.0/data/all_tissue_SNP_Gene.txt',
hg19tohg38 = False, min_percent = 0.01)
Train the gene expression prediction model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_scReGAT, test_dataset = \
train_scregat(path_demo, dataset_atac, dir_scReGAT, device,
hidwidth=16, numhead=8, num_epoch=20, learning_rate=1e-3,
split_prop = 0.6,print_process = True)
Predict regulatory intensity scores (RISs) by attribution method
- Load well-trained model
path_model = os.path.join(path_demo, dir_scReGAT)
file_model = os.path.join(path_model,'Model_batch_size_16_hidwidth_16_numhead_8_lr_0.001_numepoch_20_split_0.6.pt')
model_scReGAT = torch.load(file_model)
- Write predicted RIS matrix to "file_weight"
file_weight = os.path.join(path_model, "mat_RIS.tsv")
df_mat_RIS = explain_model_ig(
dataset_atac, model_scReGAT, device, test_dataset, file_weight,print_process = True)
Visualize the cells profiled by the RIS matrix
plot_edge(df_mat_RIS, dataset_atac)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scReGAT-0.0.13-py2.py3-none-any.whl.
File metadata
- Download URL: scReGAT-0.0.13-py2.py3-none-any.whl
- Upload date:
- Size: 34.9 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ac62008c32108290ec0af74c7529e0df60fc23597ec2be6ebae55c47855b998
|
|
| MD5 |
3a3ee441a34081c42e0d8be2654b3215
|
|
| BLAKE2b-256 |
6663e4c6e6636de65a8516bef8cf2ad8d76ae151732cafa48a2584c4aed66f3d
|