Skip to main content

Command-line toolkit for GO enrichment analysis

Project description

gokit

Command-line toolkit for Gene Ontology enrichment analysis.

Docs · Report Bug · Request Feature

CI Python Versions License


This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.

Quick Start

# install
pip install -e .[dev,plot]

# download default ontology files into current directory
gokit download

# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt

# run enrichment
gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

# build a consolidated markdown report
gokit report --run results/goea

Defaults that reduce flags:

  • --obo defaults to ./go-basic.obo
  • --assoc-format defaults to auto
  • --test-direction defaults to both

Input File Format

Minimal expected inputs:

  • study.txt: one study gene ID per line.
  • population.txt: one background gene ID per line.
  • assoc.txt: one gene-to-GO mapping per line as <gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.

Example:

# study.txt
geneA
geneB

# population.txt
geneA
geneB
geneC
geneD

# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575

Installation

We recommend using a virtual environment.

python -m venv venv
source venv/bin/activate
pip install -e .[dev]

For plotting support (term plots and semantic networks):

pip install -e .[plot]

To install from source:

git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
python -m venv venv
source venv/bin/activate
pip install -e .[dev,plot]

Command Status

Command Status What it does
gokit enrich Supported Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest.
gokit validate Supported Validates required inputs before enrichment.
gokit plot Supported Generates figures from enrichment tables and semantic similarity matrices.
gokit download Supported Downloads go-basic.obo and goslim_generic.obo from GO endpoints.
gokit report Supported Generates a consolidated markdown run report.
gokit explain Placeholder Current scaffold only; detailed statistical/ancestor trace explanation is planned.

Shorthand aliases:

  • gk_enrich
  • gk_validate
  • gk_plot
  • gk_download
  • gk_report
  • gk_explain

Common Workflows

Single-study enrichment:

gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

Batch enrichment + semantic similarity:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --assoc-format id2gos \
  --out results_batch \
  --out-formats tsv,jsonl \
  --compare-semantic \
  --semantic-metric wang \
  --semantic-top-k 5 \
  --semantic-namespace all \
  --semantic-min-padjsig 0.05

studies.tsv accepts either:

  • study_name<TAB>/path/to/study.txt
  • /path/to/study.txt (name inferred from filename)

Plotting Examples

Term-level and direction summary figures:

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind term-bar \
  --direction both \
  --top-n 20 \
  --out figures/study_a_terms \
  --format png

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind direction-summary \
  --alpha 0.05 \
  --out figures/study_a_direction_summary.png

Semantic network figure from batch similarity matrix:

gokit plot \
  --input results_batch/semantic_similarity.tsv \
  --kind semantic-network \
  --min-similarity 0.25 \
  --max-edges 40 \
  --out figures/semantic_network.png

Optional auto-plot emission from enrich:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --out results_batch \
  --compare-semantic \
  --emit-plots term-bar,direction-summary,semantic-network \
  --plot-format png

Supported Analysis Controls

  • Association formats: id2gos, gaf, gpad, gene2go, auto
  • Multiple-testing methods (--method):
    • fdr_bh (default)
    • fdr_by
    • bonferroni
    • holm
    • none
  • Direction tests (--test-direction): both (default), over, under
  • Semantic metrics (--semantic-metric): jaccard, resnik, lin, wang
  • ID normalization (--id-type): auto, str, int

Download Command Equivalence

gokit download is equivalent to:

  • wget http://current.geneontology.org/ontology/go-basic.obo
  • wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokit-0.1.2.tar.gz (242.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gokit-0.1.2-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file gokit-0.1.2.tar.gz.

File metadata

  • Download URL: gokit-0.1.2.tar.gz
  • Upload date:
  • Size: 242.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for gokit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f9304d58f20e45da5cc177fa051d4895a2d9f9bbae74169b4f9d178e8f58dad6
MD5 565736b8adc8830bbee85da7568edc79
BLAKE2b-256 8ee34b87d51f5513100f12ae364482637fc0ffcb1c43bd06d0eed99dcb37fb3b

See more details on using hashes here.

File details

Details for the file gokit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: gokit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for gokit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d8e5ab9c7f5520634298cd1fcb651480579bca80049d208c07884e064c670b68
MD5 2b3252392c1a6d77b32fe324699a3efa
BLAKE2b-256 d6a2aeb85359c94616db8bd854281bdcb6b443e8b4e1fb7218d1ffa8f95d90d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page