Skip to main content

Gene knock-out inference from single-cell data with variational graph autoencoders

Project description

GenKI — Gene Knock-out Inference

PyPI version License: MIT DOI

A Variational Graph Auto-Encoder (VGAE) model for predicting gene perturbation effects from scRNA-seq data. GenKI performs in silico gene knock-out experiments on a gene regulatory network (GRN) without requiring real knock-out data.

GenKI logo

Prerequisites

GenKI requires Python ≥ 3.10. PyTorch and PyTorch Geometric are installed automatically (CPU builds) with the package. For a GPU/CUDA build, install them first to match your CUDA version:

  1. Install PyTorch
  2. Install PyTorch Geometric

Installation

pip install GenKI

Or install directly from source:

pip install git+https://github.com/yjgeno/GenKI.git

Or with conda (sets up the full environment):

conda env create -f environment.yml
conda activate ogenki

Quick Start

The high-level GenKI facade runs the whole workflow — load & preprocess data, build the GRN, train the VGAE, and rank genes — in one call:

from GenKI import GenKI

ranked = GenKI.from_h5ad(
    "data/my_data.h5ad",
    target_gene=["TUBG1"],   # gene(s) to knock out (upper-cased by default)
).run(epochs=100, seed=8096, n_permutations=100)

print(ranked)   # genes ranked by perturbation effect

Separate the training and prediction steps when you want to inspect the model in between:

gk = GenKI.from_h5ad("data/my_data.h5ad", target_gene=["TUBG1"])
gk.fit(epochs=100, lr=7e-4, beta=1e-4, seed=8096)
ranked = gk.predict(n_permutations=100, by="KL")

print(gk.metrics)        # (epochs, loss, AUROC, AP)
gk.loader, gk.trainer    # escape hatch to the underlying objects

Start from an in-memory AnnData instead of a file (set preprocess=True to normalize/standardize it):

import scanpy as sc

adata = sc.read_h5ad("data/my_data.h5ad")
gk = GenKI.from_adata(adata, target_gene=["TUBG1"], preprocess=True)
ranked = gk.run(seed=8096)

Building the GRN in parallel needs the optional Ray extra (pip install "GenKI[ray]"); pass n_cpus and other GRN options as keyword arguments, e.g. GenKI.from_h5ad(..., rebuild_grn=True, n_cpus=8).

Lower-level API (fine-grained control over each step)
from GenKI.preprocessing import build_adata
from GenKI.dataLoader import DataLoader
from GenKI.train import VGAE_trainer
from GenKI import utils

# 1. Load and preprocess data
adata = build_adata("data/my_data.h5ad")

# 2. Build GRN and prepare WT / virtual-KO graph data
data_wrapper = DataLoader(
    adata,
    target_gene=["TUBG1"],   # gene to knock out
    target_cell=None,         # None = use all cells
    GRN_file_dir="GRNs",
    rebuild_GRN=True,
    pcNet_name="pcNet",
    verbose=True,
    n_cpus=8,
)
data_wt = data_wrapper.load_data()
data_ko = data_wrapper.load_kodata()

# 3. Train VGAE
sensei = VGAE_trainer(data_wt, epochs=100, lr=7e-4, beta=1e-4, seed=8096)
sensei.train()

# 4. Get latent distributions and compute KL divergence per gene
z_mu_wt, z_std_wt = sensei.get_latent_vars(data_wt)
z_mu_ko, z_std_ko = sensei.get_latent_vars(data_ko)
dis = utils.get_distance(z_mu_ko, z_std_ko, z_mu_wt, z_std_wt, by="KL")

# 5. Rank genes by perturbation effect (with permutation test)
null = sensei.pmt(data_ko, n=100, by="KL")
res = utils.get_generank(data_wt, dis, null)
print(res)

API

Symbol Description
GenKI.GenKI High-level facade: from_h5ad / from_adata constructors and fit / predict / run methods covering the full workflow
GenKI.dataLoader.DataLoader Wraps an AnnData object, builds/loads the GRN, and produces PyG Data objects for WT and virtual-KO conditions
GenKI.train.VGAE_trainer Trains the VGAE, exposes latent variables, permutation testing, and model save/load
GenKI.utils.get_distance Computes per-gene distribution distance (KL, EMD, t-test) between two latent spaces
GenKI.utils.get_generank Ranks genes by perturbation score; optionally filters by permutation-test significance
GenKI.preprocessing.build_adata Loads an .h5ad file and adds a log-normalised layer used by DataLoader
GenKI.pcNet.make_pcNet Builds a principal-component-based GRN from expression data (optionally parallelised with Ray)

Tutorial

Step-by-step virtual KO example: notebook/Example.ipynb

Citation

If you use GenKI in your research, please cite:

Yang Y, Wang M, Ni P, Zhong J. GenKI: Virtual gene knockout inference with variational graph autoencoder. Nucleic Acids Research, 2023. https://doi.org/10.1093/nar/gkad450

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genki-0.2.0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genki-0.2.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file genki-0.2.0.tar.gz.

File metadata

  • Download URL: genki-0.2.0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genki-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0743884b4080614287dbc130e50feb5e24a13643a9284ba8a46286b888c270c1
MD5 38446e2a2d975f1a378beb3455b5ae82
BLAKE2b-256 d415f2887cab11dad767c1298726f8e767a45a6930e233908a58267ff3146f01

See more details on using hashes here.

Provenance

The following attestation bundles were made for genki-0.2.0.tar.gz:

Publisher: publish.yml on yjgeno/GenKI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genki-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: genki-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genki-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0322e8a85f02226f75dff5b0efc8a46c8ea5a5a18378a38fe74eb56d074b2aa1
MD5 568cf13fb85553126a1dabbe9f3b0d60
BLAKE2b-256 005793335a85c48d771f4af92605ce778db8985fa7ac87515ac0cde1a7a6d7ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for genki-0.2.0-py3-none-any.whl:

Publisher: publish.yml on yjgeno/GenKI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page