Skip to main content

Command-line toolkit for GO enrichment analysis

Project description

gokit

Command-line toolkit for Gene Ontology enrichment analysis.

Docs · Report Bug · Request Feature

CI Bluesky Python Versions License


This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.

Quick Start

# install
pip install -e .[dev]

# download default ontology files into current directory
gokit download

# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt

# run enrichment
gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

# build a consolidated markdown report
gokit report --run results/goea

Defaults that reduce flags:

  • --obo defaults to ./go-basic.obo
  • --assoc-format defaults to auto
  • --test-direction defaults to both

Input File Format

Minimal expected inputs:

  • study.txt: one study gene ID per line.
  • population.txt: one background gene ID per line.
  • assoc.txt: one gene-to-GO mapping per line as <gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.

Example:

# study.txt
geneA
geneB

# population.txt
geneA
geneB
geneC
geneD

# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575

Installation

We recommend using a virtual environment.

python -m venv venv
source venv/bin/activate
pip install -e .[dev]

To install from source:

git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
python -m venv venv
source venv/bin/activate
pip install -e .[dev]

Command Status

Command Status What it does
gokit enrich Supported Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest.
gokit validate Supported Validates required inputs before enrichment.
gokit plot Supported Generates figures from enrichment tables and semantic similarity matrices.
gokit download Supported Downloads go-basic.obo and goslim_generic.obo from GO endpoints.
gokit report Supported Generates a consolidated markdown run report.
gokit explain Placeholder Current scaffold only; detailed statistical/ancestor trace explanation is planned.

Shorthand aliases:

  • gk_enrich
  • gk_validate
  • gk_plot
  • gk_download
  • gk_report
  • gk_explain

Common Workflows

Single-study enrichment:

gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

Batch enrichment + semantic similarity:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --assoc-format id2gos \
  --out results_batch \
  --out-formats tsv,jsonl \
  --compare-semantic \
  --semantic-metric wang \
  --semantic-top-k 5 \
  --semantic-namespace all \
  --semantic-min-padjsig 0.05

studies.tsv accepts either:

  • study_name<TAB>/path/to/study.txt
  • /path/to/study.txt (name inferred from filename)

Plotting Examples

Term-level and direction summary figures:

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind term-bar \
  --direction both \
  --top-n 20 \
  --out figures/study_a_terms \
  --format png

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind direction-summary \
  --alpha 0.05 \
  --out figures/study_a_direction_summary.png

Semantic network figure from batch similarity matrix:

gokit plot \
  --input results_batch/semantic_similarity.tsv \
  --kind semantic-network \
  --min-similarity 0.25 \
  --max-edges 40 \
  --out figures/semantic_network.png

Optional auto-plot emission from enrich:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --out results_batch \
  --compare-semantic \
  --emit-plots term-bar,direction-summary,semantic-network \
  --plot-format png

Example Figures

The following figures were generated from larger multi-study example tables in examples/data/realistic_plots/.

Term-bar plot (--kind term-bar, top 30 terms):

Term bar plot

Direction summary plot (--kind direction-summary):

Direction summary plot

Semantic network plot (--kind semantic-network, 8-study matrix):

Semantic network plot

Supported Analysis Controls

  • Association formats: id2gos, gaf, gpad, gene2go, auto
  • Multiple-testing methods (--method):
    • fdr_bh (default)
    • fdr_by
    • bonferroni
    • holm
    • none
  • Direction tests (--test-direction): both (default), over, under
  • Semantic metrics (--semantic-metric): jaccard, resnik, lin, wang
  • ID normalization (--id-type): auto, str, int

Download Command Equivalence

gokit download is equivalent to:

  • wget http://current.geneontology.org/ontology/go-basic.obo
  • wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokit-0.1.3.tar.gz (21.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gokit-0.1.3-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file gokit-0.1.3.tar.gz.

File metadata

  • Download URL: gokit-0.1.3.tar.gz
  • Upload date:
  • Size: 21.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gokit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3ddc04e04bd6131e9d2f4958a1769ba5f153874d64656a2bf8f9a745e6697b4f
MD5 339f2c6e1da79009ea1e8bf211e5a9d1
BLAKE2b-256 ab422b65d8529a3c42e57fd4198c5bf3b44c978692b2530bfa6e568ceff63f18

See more details on using hashes here.

File details

Details for the file gokit-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: gokit-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gokit-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 30c335219248730642ad69b352c987e1ae5009e9d0b6a718559771691e584477
MD5 c7615c62088c23e1a6443669b11ad3f7
BLAKE2b-256 f03e8fdd27ad43a6dcb95c76b9cef4558aaadc8cf3e7c30f4de4c1cd1f6bb826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page