Command-line toolkit for GO enrichment analysis
Project description
gokit
Command-line toolkit for Gene Ontology enrichment analysis.
Docs · Report Bug · Request Feature
This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.
Quick Start
# install
pip install -e .[dev]
# download default ontology files into current directory
gokit download
# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt
# run enrichment
gokit enrich \
--study study.txt \
--population population.txt \
--assoc assoc.txt \
--out results/goea
# build a consolidated markdown report
gokit report --run results/goea
Defaults that reduce flags:
--obodefaults to./go-basic.obo--assoc-formatdefaults toauto--test-directiondefaults toboth
Input File Format
Minimal expected inputs:
study.txt: one study gene ID per line.population.txt: one background gene ID per line.assoc.txt: one gene-to-GO mapping per line as<gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.
Example:
# study.txt
geneA
geneB
# population.txt
geneA
geneB
geneC
geneD
# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575
Installation
We recommend using a virtual environment.
python -m venv venv
source venv/bin/activate
pip install -e .[dev]
To install from source:
git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
python -m venv venv
source venv/bin/activate
pip install -e .[dev]
Command Status
| Command | Status | What it does |
|---|---|---|
gokit enrich |
Supported | Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest. |
gokit validate |
Supported | Validates required inputs before enrichment. |
gokit plot |
Supported | Generates figures from enrichment tables and semantic similarity matrices. |
gokit download |
Supported | Downloads go-basic.obo and goslim_generic.obo from GO endpoints. |
gokit report |
Supported | Generates a consolidated markdown run report. |
gokit explain |
Placeholder | Current scaffold only; detailed statistical/ancestor trace explanation is planned. |
Shorthand aliases:
gk_enrichgk_validategk_plotgk_downloadgk_reportgk_explain
Common Workflows
Single-study enrichment:
gokit enrich \
--study study.txt \
--population population.txt \
--assoc assoc.txt \
--out results/goea
Batch enrichment + semantic similarity:
gokit enrich \
--studies studies.tsv \
--population population.txt \
--assoc assoc.txt \
--assoc-format id2gos \
--out results_batch \
--out-formats tsv,jsonl \
--compare-semantic \
--semantic-metric wang \
--semantic-top-k 5 \
--semantic-namespace all \
--semantic-min-padjsig 0.05
studies.tsv accepts either:
study_name<TAB>/path/to/study.txt/path/to/study.txt(name inferred from filename)
Plotting Examples
Term-level and direction summary figures:
gokit plot \
--input results_batch/all_studies.tsv \
--study-id study_a \
--kind term-bar \
--direction both \
--top-n 20 \
--out figures/study_a_terms \
--format png
gokit plot \
--input results_batch/all_studies.tsv \
--study-id study_a \
--kind direction-summary \
--alpha 0.05 \
--out figures/study_a_direction_summary.png
Semantic network figure from batch similarity matrix:
gokit plot \
--input results_batch/semantic_similarity.tsv \
--kind semantic-network \
--min-similarity 0.25 \
--max-edges 40 \
--out figures/semantic_network.png
Optional auto-plot emission from enrich:
gokit enrich \
--studies studies.tsv \
--population population.txt \
--assoc assoc.txt \
--out results_batch \
--compare-semantic \
--emit-plots term-bar,direction-summary,semantic-network \
--plot-format png
Example Figures
The following figures were generated from larger multi-study example tables in
examples/data/realistic_plots/.
Term-bar plot (--kind term-bar, top 30 terms):
Direction summary plot (--kind direction-summary):
Semantic network plot (--kind semantic-network, 8-study matrix):
Supported Analysis Controls
- Association formats:
id2gos,gaf,gpad,gene2go,auto - Multiple-testing methods (
--method):fdr_bh(default)fdr_bybonferroniholmnone
- Direction tests (
--test-direction):both(default),over,under - Semantic metrics (
--semantic-metric):jaccard,resnik,lin,wang - ID normalization (
--id-type):auto,str,int
Download Command Equivalence
gokit download is equivalent to:
wget http://current.geneontology.org/ontology/go-basic.obowget http://current.geneontology.org/ontology/subsets/goslim_generic.obo
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gokit-0.1.3.tar.gz.
File metadata
- Download URL: gokit-0.1.3.tar.gz
- Upload date:
- Size: 21.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ddc04e04bd6131e9d2f4958a1769ba5f153874d64656a2bf8f9a745e6697b4f
|
|
| MD5 |
339f2c6e1da79009ea1e8bf211e5a9d1
|
|
| BLAKE2b-256 |
ab422b65d8529a3c42e57fd4198c5bf3b44c978692b2530bfa6e568ceff63f18
|
File details
Details for the file gokit-0.1.3-py3-none-any.whl.
File metadata
- Download URL: gokit-0.1.3-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30c335219248730642ad69b352c987e1ae5009e9d0b6a718559771691e584477
|
|
| MD5 |
c7615c62088c23e1a6443669b11ad3f7
|
|
| BLAKE2b-256 |
f03e8fdd27ad43a6dcb95c76b9cef4558aaadc8cf3e7c30f4de4c1cd1f6bb826
|