Skip to main content

Command-line toolkit for GO enrichment analysis

Project description

gokit

Command-line toolkit for Gene Ontology enrichment analysis.

Docs · Report Bug · Request Feature

CI Bluesky Python Versions License


This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.

Quick Start

# install
pip install -e .[dev]

# download default ontology files into current directory
gokit download

# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt

# run enrichment
gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

# build a consolidated markdown report
gokit report --run results/goea

Defaults that reduce flags:

  • --obo defaults to ./go-basic.obo
  • --assoc-format defaults to auto
  • --test-direction defaults to both

Input File Format

Minimal expected inputs:

  • study.txt: one study gene ID per line.
  • population.txt: one background gene ID per line.
  • assoc.txt: one gene-to-GO mapping per line as <gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.

Example:

# study.txt
geneA
geneB

# population.txt
geneA
geneB
geneC
geneD

# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575

Installation

We recommend using a virtual environment.

python -m venv venv
source venv/bin/activate
pip install -e .[dev]

To install from source:

git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
python -m venv venv
source venv/bin/activate
pip install -e .[dev]

Command Status

Command Status What it does
gokit enrich Supported Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest.
gokit validate Supported Validates required inputs before enrichment.
gokit plot Supported Generates figures from enrichment tables and semantic similarity matrices.
gokit download Supported Downloads go-basic.obo and goslim_generic.obo from GO endpoints.
gokit report Supported Generates a consolidated markdown run report.
gokit explain Placeholder Current scaffold only; detailed statistical/ancestor trace explanation is planned.

Shorthand aliases:

  • gk_enrich
  • gk_validate
  • gk_plot
  • gk_download
  • gk_report
  • gk_explain

Common Workflows

Single-study enrichment:

gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

Batch enrichment + semantic similarity:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --assoc-format id2gos \
  --out results_batch \
  --out-formats tsv,jsonl \
  --compare-semantic \
  --semantic-metric wang \
  --semantic-top-k 5 \
  --semantic-namespace all \
  --semantic-min-padjsig 0.05

studies.tsv accepts either:

  • study_name<TAB>/path/to/study.txt
  • /path/to/study.txt (name inferred from filename)

Plotting Examples

Term-level and direction summary figures:

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind term-bar \
  --direction both \
  --top-n 20 \
  --out figures/study_a_terms \
  --format png

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind direction-summary \
  --alpha 0.05 \
  --out figures/study_a_direction_summary.png

Semantic network figure from batch similarity matrix:

gokit plot \
  --input results_batch/semantic_similarity.tsv \
  --kind semantic-network \
  --min-similarity 0.25 \
  --max-edges 40 \
  --out figures/semantic_network.png

Optional auto-plot emission from enrich:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --out results_batch \
  --compare-semantic \
  --emit-plots term-bar,direction-summary,semantic-network \
  --plot-format png

Example Figures

The following figures were generated from larger multi-study example tables in examples/data/realistic_plots/.

Term-bar plot (--kind term-bar, top 30 terms):

Term bar plot

Direction summary plot (--kind direction-summary):

Direction summary plot

Semantic network plot (--kind semantic-network, 8-study matrix):

Semantic network plot

Supported Analysis Controls

  • Association formats: id2gos, gaf, gpad, gene2go, auto
  • Multiple-testing methods (--method):
    • fdr_bh (default)
    • fdr_by
    • bonferroni
    • holm
    • none
  • Direction tests (--test-direction): both (default), over, under
  • Semantic metrics (--semantic-metric): jaccard, resnik, lin, wang
  • ID normalization (--id-type): auto, str, int

Download Command Equivalence

gokit download is equivalent to:

  • wget http://current.geneontology.org/ontology/go-basic.obo
  • wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokit-0.1.4.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gokit-0.1.4-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file gokit-0.1.4.tar.gz.

File metadata

  • Download URL: gokit-0.1.4.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for gokit-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1a74658d4c7769d48dbce6bf48a7ae9ec21a0428d3410c55836dd1bad53e5d43
MD5 39b6a392687d14d63e20ea3e2c942af8
BLAKE2b-256 e2dab42a9a1487975bdbd43bdaf574263cf919e19f549ffdbf5a5555a94e6f2c

See more details on using hashes here.

File details

Details for the file gokit-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: gokit-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for gokit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 caba1c813ac38613cd94ee6437c9b81e55660a470c9f91c5677d246f9347ee3a
MD5 8df02ed0c5004a0584ec61d9ede7d7ad
BLAKE2b-256 70268c285ce44b3ad4024a30280c6fc0d0f50a306d8072723ae87eb1e0bb7e44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page