Skip to main content

Command-line toolkit for GO enrichment analysis

Project description

gokit

Command-line toolkit for Gene Ontology enrichment analysis.

Docs · Report Bug · Request Feature

CI Bluesky PyPI Python Versions License


This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.

Quick Start

# install from PyPI
pip install gokit

# download default ontology files into current directory
gokit download

# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt

# run enrichment
gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

# build a consolidated markdown report
gokit report --run results/goea

Defaults that reduce flags:

  • --obo defaults to ./go-basic.obo
  • --assoc-format defaults to auto
  • --test-direction defaults to both

Input File Format

Minimal expected inputs:

  • study.txt: one study gene ID per line.
  • population.txt: one background gene ID per line.
  • assoc.txt: one gene-to-GO mapping per line as <gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.

Example:

# study.txt
geneA
geneB

# population.txt
geneA
geneB
geneC
geneD

# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575

Installation

Install from PyPI:

pip install gokit

To install from source:

git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
pip install -e .[dev]

Command Status

Command Status What it does
gokit enrich Supported Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest.
gokit validate Supported Validates required inputs before enrichment.
gokit plot Supported Generates figures from enrichment tables and semantic similarity matrices.
gokit download Supported Downloads go-basic.obo and goslim_generic.obo from GO endpoints.
gokit report Supported Generates a consolidated markdown run report.
gokit explain Placeholder Current scaffold only; detailed statistical/ancestor trace explanation is planned.

Shorthand aliases:

  • gk_enrich
  • gk_validate
  • gk_plot
  • gk_download
  • gk_report
  • gk_explain

Common Workflows

Single-study enrichment:

gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

Batch enrichment + semantic similarity:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --assoc-format id2gos \
  --out results_batch \
  --out-formats tsv,jsonl \
  --compare-semantic \
  --semantic-metric wang \
  --semantic-top-k 5 \
  --semantic-namespace all \
  --semantic-min-padjsig 0.05

studies.tsv accepts either:

  • study_name<TAB>/path/to/study.txt
  • /path/to/study.txt (name inferred from filename)

Plotting Examples

Term-level and direction summary figures:

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind term-bar \
  --direction both \
  --top-n 20 \
  --out figures/study_a_terms \
  --format png

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind direction-summary \
  --alpha 0.05 \
  --out figures/study_a_direction_summary.png

Semantic network figure from batch similarity matrix:

gokit plot \
  --input results_batch/semantic_similarity.tsv \
  --kind semantic-network \
  --min-similarity 0.25 \
  --max-edges 40 \
  --out figures/semantic_network.png

Optional auto-plot emission from enrich:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --out results_batch \
  --compare-semantic \
  --emit-plots term-bar,direction-summary,semantic-network \
  --plot-format png

Example Figures

The following figures were generated from larger multi-study example tables in examples/data/realistic_plots/.

Term-bar plot (--kind term-bar, top 30 terms):

Term bar plot

Direction summary plot (--kind direction-summary):

Direction summary plot

Semantic network plot (--kind semantic-network, 8-study matrix):

Semantic network plot

Supported Analysis Controls

  • Association formats: id2gos, gaf, gpad, gene2go, auto
  • Multiple-testing methods (--method):
    • fdr_bh (default)
    • fdr_by
    • bonferroni
    • holm
    • none
  • Direction tests (--test-direction): both (default), over, under
  • Semantic metrics (--semantic-metric): jaccard, resnik, lin, wang
  • ID normalization (--id-type): auto, str, int

Download Command Equivalence

gokit download is equivalent to:

  • wget http://current.geneontology.org/ontology/go-basic.obo
  • wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokit-0.1.7.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gokit-0.1.7-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file gokit-0.1.7.tar.gz.

File metadata

  • Download URL: gokit-0.1.7.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for gokit-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6b3af30b3aa1aaf3ef09903ebf45a02f7d4bc7786c3b4c087adaec7db13078e6
MD5 c1119e79012997895fd336620c8de81a
BLAKE2b-256 583caf6992bf0dca9570029ffb08e0b2a7fe82b46286d6a429048c3b238c8b04

See more details on using hashes here.

File details

Details for the file gokit-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: gokit-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for gokit-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 db4a756217a3488737731051f68a0101fb03bfa95c16259561231fa8fd93034a
MD5 066da53f5303a8c717481d1344d574a8
BLAKE2b-256 4bed514e4b8e6ee04215f59f8201754891916d02d2c8cf0a3b90693d934ca97f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page