Gene knock-out inference from single-cell data with variational graph autoencoders
Project description
GenKI — Gene Knock-out Inference
A Variational Graph Auto-Encoder (VGAE) model for predicting gene perturbation effects from scRNA-seq data. GenKI performs in silico gene knock-out experiments on a gene regulatory network (GRN) without requiring real knock-out data.
Prerequisites
GenKI requires Python ≥ 3.10. PyTorch and PyTorch Geometric are installed automatically (CPU builds) with the package. For a GPU/CUDA build, install them first to match your CUDA version:
Installation
pip install GenKI
Or install directly from source:
pip install git+https://github.com/yjgeno/GenKI.git
Or with conda (sets up the full environment):
conda env create -f environment.yml
conda activate ogenki
Example Data
A real microglial (wild-type) scRNA-seq dataset is bundled in data/microglial_seurat_WT.h5ad so you can run GenKI immediately without sourcing your own data. The Quick Start examples below use it directly.
Quick Start
The high-level GenKI facade runs the whole workflow — load & preprocess data, build the GRN, train the VGAE, and rank genes — in one call:
from GenKI import GenKI
# Uses the bundled example dataset — no extra downloads needed
ranked = GenKI.from_h5ad(
"data/microglial_seurat_WT.h5ad",
target_gene=["TUBG1"], # gene(s) to knock out (upper-cased by default)
).run(epochs=100, seed=8096, n_permutations=100)
print(ranked) # genes ranked by perturbation effect
Separate the training and prediction steps when you want to inspect the model in between:
gk = GenKI.from_h5ad("data/microglial_seurat_WT.h5ad", target_gene=["TUBG1"])
gk.fit(epochs=100, lr=7e-4, beta=1e-4, seed=8096)
ranked = gk.predict(n_permutations=100, by="KL")
print(gk.metrics) # (epochs, loss, AUROC, AP)
gk.loader, gk.trainer # escape hatch to the underlying objects
Start from an in-memory AnnData instead of a file (set preprocess=True to normalize/standardize it):
import scanpy as sc
adata = sc.read_h5ad("data/microglial_seurat_WT.h5ad")
gk = GenKI.from_adata(adata, target_gene=["TUBG1"], preprocess=True)
ranked = gk.run(seed=8096)
Building the GRN in parallel needs the optional Ray extra (pip install "GenKI[ray]"); pass n_cpus and other GRN options as keyword arguments, e.g. GenKI.from_h5ad(..., rebuild_grn=True, n_cpus=8).
Lower-level API (fine-grained control over each step)
from GenKI.preprocessing import build_adata
from GenKI.dataLoader import DataLoader
from GenKI.train import VGAE_trainer
from GenKI import utils
# 1. Load and preprocess data
adata = build_adata("data/microglial_seurat_WT.h5ad")
# 2. Build GRN and prepare WT / virtual-KO graph data
data_wrapper = DataLoader(
adata,
target_gene=["TUBG1"], # gene to knock out
target_cell=None, # None = use all cells
GRN_file_dir="GRNs",
n_cpus=8,
)
data_wt = data_wrapper.load_data()
data_ko = data_wrapper.load_kodata()
# 3. Train VGAE
sensei = VGAE_trainer(data_wt, epochs=100, lr=7e-4, beta=1e-4, seed=8096)
sensei.train()
# 4. Get latent distributions and compute KL divergence per gene
z_mu_wt, z_std_wt = sensei.get_latent_vars(data_wt)
z_mu_ko, z_std_ko = sensei.get_latent_vars(data_ko)
dis = utils.get_distance(z_mu_ko, z_std_ko, z_mu_wt, z_std_wt, by="KL")
# 5. Rank genes by perturbation effect (with permutation test)
null = sensei.pmt(data_ko, n=100, by="KL")
res = utils.get_generank(data_wt, dis, null)
print(res)
API
| Symbol | Description |
|---|---|
GenKI.GenKI |
High-level facade: from_h5ad / from_adata constructors and fit / predict / run methods covering the full workflow |
GenKI.dataLoader.DataLoader |
Wraps an AnnData object, builds/loads the GRN, and produces PyG Data objects for WT and virtual-KO conditions |
GenKI.train.VGAE_trainer |
Trains the VGAE, exposes latent variables, permutation testing, and model save/load |
GenKI.utils.get_distance |
Computes per-gene distribution distance (KL, EMD, t-test) between two latent spaces |
GenKI.utils.get_generank |
Ranks genes by perturbation score; optionally filters by permutation-test significance |
GenKI.preprocessing.build_adata |
Loads an .h5ad file and adds a log-normalised layer used by DataLoader |
GenKI.pcNet.make_pcNet |
Builds a principal-component-based GRN from expression data (optionally parallelised with Ray) |
Tutorial
Step-by-step virtual KO example: notebook/Example.ipynb
Citation
If you use GenKI in your research, please cite:
Yang Y, Wang M, Ni P, Zhong J. GenKI: Virtual gene knockout inference with variational graph autoencoder. Nucleic Acids Research, 2023. https://doi.org/10.1093/nar/gkad450
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genki-0.2.1.tar.gz.
File metadata
- Download URL: genki-0.2.1.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f91282b8a928ef1aae3d4b6161ced6cae4b55b535079e00270d2c799cc2eebc6
|
|
| MD5 |
9b9ac4a0936b7cee1e9c6e9d7ad12d62
|
|
| BLAKE2b-256 |
536f9ea4e60458cd35166384d62577d4f51f5a34ce273b4eeca76812cbb9b67e
|
Provenance
The following attestation bundles were made for genki-0.2.1.tar.gz:
Publisher:
publish.yml on yjgeno/GenKI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genki-0.2.1.tar.gz -
Subject digest:
f91282b8a928ef1aae3d4b6161ced6cae4b55b535079e00270d2c799cc2eebc6 - Sigstore transparency entry: 1576456803
- Sigstore integration time:
-
Permalink:
yjgeno/GenKI@5b71c629fc85d81aca544bdee4b1ccca3b9645f6 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/yjgeno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b71c629fc85d81aca544bdee4b1ccca3b9645f6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file genki-0.2.1-py3-none-any.whl.
File metadata
- Download URL: genki-0.2.1-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1f060d18584c10f9b8b22947cc53caab5e7e378e2dc836b7a95b69034588fdb
|
|
| MD5 |
14f430c54b7bce436ffda5dc0d115581
|
|
| BLAKE2b-256 |
92b1cd94d49ead30cd7b011137eb08fd9d3e1136a5c895b73296aa605f6e8b21
|
Provenance
The following attestation bundles were made for genki-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on yjgeno/GenKI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genki-0.2.1-py3-none-any.whl -
Subject digest:
f1f060d18584c10f9b8b22947cc53caab5e7e378e2dc836b7a95b69034588fdb - Sigstore transparency entry: 1576456954
- Sigstore integration time:
-
Permalink:
yjgeno/GenKI@5b71c629fc85d81aca544bdee4b1ccca3b9645f6 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/yjgeno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b71c629fc85d81aca544bdee4b1ccca3b9645f6 -
Trigger Event:
release
-
Statement type: