Skip to main content

Gene knock-out inference from single-cell data with variational graph autoencoders

Project description

GenKI — Gene Knock-out Inference

PyPI version License: MIT DOI

A Variational Graph Auto-Encoder (VGAE) model for predicting gene perturbation effects from scRNA-seq data. GenKI performs in silico gene knock-out experiments on a gene regulatory network (GRN) without requiring real knock-out data.

GenKI logo

Prerequisites

GenKI requires Python ≥ 3.10. PyTorch and PyTorch Geometric are installed automatically (CPU builds) with the package. For a GPU/CUDA build, install them first to match your CUDA version:

  1. Install PyTorch
  2. Install PyTorch Geometric

Installation

pip install GenKI

Or install directly from source:

pip install git+https://github.com/yjgeno/GenKI.git

Or with conda (sets up the full environment):

conda env create -f environment.yml
conda activate ogenki

Example Data

A real microglial (wild-type) scRNA-seq dataset is bundled in data/microglial_seurat_WT.h5ad so you can run GenKI immediately without sourcing your own data. The Quick Start examples below use it directly.

Quick Start

The high-level GenKI facade runs the whole workflow — load & preprocess data, build the GRN, train the VGAE, and rank genes — in one call:

from GenKI import GenKI

# Uses the bundled example dataset — no extra downloads needed
ranked = GenKI.from_h5ad(
    "data/microglial_seurat_WT.h5ad",
    target_gene=["TUBG1"],   # gene(s) to knock out (upper-cased by default)
).run(epochs=100, seed=8096, n_permutations=100)

print(ranked)   # genes ranked by perturbation effect

Separate the training and prediction steps when you want to inspect the model in between:

gk = GenKI.from_h5ad("data/microglial_seurat_WT.h5ad", target_gene=["TUBG1"])
gk.fit(epochs=100, lr=7e-4, beta=1e-4, seed=8096)
ranked = gk.predict(n_permutations=100, by="KL")

print(gk.metrics)        # (epochs, loss, AUROC, AP)
gk.loader, gk.trainer    # escape hatch to the underlying objects

Start from an in-memory AnnData instead of a file (set preprocess=True to normalize/standardize it):

import scanpy as sc

adata = sc.read_h5ad("data/microglial_seurat_WT.h5ad")
gk = GenKI.from_adata(adata, target_gene=["TUBG1"], preprocess=True)
ranked = gk.run(seed=8096)

Building the GRN in parallel needs the optional Ray extra (pip install "GenKI[ray]"); pass n_cpus and other GRN options as keyword arguments, e.g. GenKI.from_h5ad(..., rebuild_grn=True, n_cpus=8).

Lower-level API (fine-grained control over each step)
from GenKI.preprocessing import build_adata
from GenKI.dataLoader import DataLoader
from GenKI.train import VGAE_trainer
from GenKI import utils

# 1. Load and preprocess data
adata = build_adata("data/microglial_seurat_WT.h5ad")

# 2. Build GRN and prepare WT / virtual-KO graph data
data_wrapper = DataLoader(
    adata,
    target_gene=["TUBG1"],   # gene to knock out
    target_cell=None,         # None = use all cells
    GRN_file_dir="GRNs",
    n_cpus=8,
)
data_wt = data_wrapper.load_data()
data_ko = data_wrapper.load_kodata()

# 3. Train VGAE
sensei = VGAE_trainer(data_wt, epochs=100, lr=7e-4, beta=1e-4, seed=8096)
sensei.train()

# 4. Get latent distributions and compute KL divergence per gene
z_mu_wt, z_std_wt = sensei.get_latent_vars(data_wt)
z_mu_ko, z_std_ko = sensei.get_latent_vars(data_ko)
dis = utils.get_distance(z_mu_ko, z_std_ko, z_mu_wt, z_std_wt, by="KL")

# 5. Rank genes by perturbation effect (with permutation test)
null = sensei.pmt(data_ko, n=100, by="KL")
res = utils.get_generank(data_wt, dis, null)
print(res)

API

Symbol Description
GenKI.GenKI High-level facade: from_h5ad / from_adata constructors and fit / predict / run methods covering the full workflow
GenKI.dataLoader.DataLoader Wraps an AnnData object, builds/loads the GRN, and produces PyG Data objects for WT and virtual-KO conditions
GenKI.train.VGAE_trainer Trains the VGAE, exposes latent variables, permutation testing, and model save/load
GenKI.utils.get_distance Computes per-gene distribution distance (KL, EMD, t-test) between two latent spaces
GenKI.utils.get_generank Ranks genes by perturbation score; optionally filters by permutation-test significance
GenKI.preprocessing.build_adata Loads an .h5ad file and adds a log-normalised layer used by DataLoader
GenKI.pcNet.make_pcNet Builds a principal-component-based GRN from expression data (optionally parallelised with Ray)

Tutorial

Step-by-step virtual KO example: notebook/Example.ipynb

Citation

If you use GenKI in your research, please cite:

Yang Y, Wang M, Ni P, Zhong J. GenKI: Virtual gene knockout inference with variational graph autoencoder. Nucleic Acids Research, 2023. https://doi.org/10.1093/nar/gkad450

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genki-0.2.1.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genki-0.2.1-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file genki-0.2.1.tar.gz.

File metadata

  • Download URL: genki-0.2.1.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genki-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f91282b8a928ef1aae3d4b6161ced6cae4b55b535079e00270d2c799cc2eebc6
MD5 9b9ac4a0936b7cee1e9c6e9d7ad12d62
BLAKE2b-256 536f9ea4e60458cd35166384d62577d4f51f5a34ce273b4eeca76812cbb9b67e

See more details on using hashes here.

Provenance

The following attestation bundles were made for genki-0.2.1.tar.gz:

Publisher: publish.yml on yjgeno/GenKI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genki-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: genki-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genki-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1f060d18584c10f9b8b22947cc53caab5e7e378e2dc836b7a95b69034588fdb
MD5 14f430c54b7bce436ffda5dc0d115581
BLAKE2b-256 92b1cd94d49ead30cd7b011137eb08fd9d3e1136a5c895b73296aa605f6e8b21

See more details on using hashes here.

Provenance

The following attestation bundles were made for genki-0.2.1-py3-none-any.whl:

Publisher: publish.yml on yjgeno/GenKI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page