Skip to main content

Command-line toolkit for GO enrichment analysis

Project description

gokit

Command-line toolkit for Gene Ontology enrichment analysis.

Docs · Report Bug · Request Feature

CI Bluesky PyPI Python Versions License


This README covers quick setup and core usage. For release process details, see docs/RELEASE.md.

Quick Start

# install from PyPI
pip install gokit

# download default ontology files into current directory
gokit download

# optional but recommended input sanity check
gokit validate --study study.txt --population population.txt --assoc assoc.txt

# run enrichment
gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

# build a consolidated markdown report
gokit report --run results/goea

Defaults that reduce flags:

  • --obo defaults to ./go-basic.obo
  • --assoc-format defaults to auto
  • --test-direction defaults to both

Input File Format

Minimal expected inputs:

  • study.txt: one study gene ID per line.
  • population.txt: one background gene ID per line.
  • assoc.txt: one gene-to-GO mapping per line as <gene_id><space>GO:NNNNNNN; multiple GO terms on one line are supported using semicolons (geneA GO:0008150;GO:0003674). Tabs are also accepted.

Example:

# study.txt
geneA
geneB

# population.txt
geneA
geneB
geneC
geneD

# assoc.txt
geneA GO:0008150;GO:0003674
geneB GO:0008150
geneC GO:0005575

Installation

Install from PyPI:

pip install gokit

To install from source:

git clone https://github.com/JLSteenwyk/gokit.git
cd gokit
pip install -e .[dev]

Command Status

Command Status What it does
gokit enrich Supported Runs GO enrichment (single or batch), writes deterministic outputs, semantic comparisons, optional auto-plot emission, and run manifest.
gokit validate Supported Validates required inputs before enrichment.
gokit plot Supported Generates figures from enrichment tables and semantic similarity matrices.
gokit download Supported Downloads go-basic.obo and goslim_generic.obo from GO endpoints.
gokit report Supported Generates a consolidated markdown run report.
gokit explain Placeholder Current scaffold only; detailed statistical/ancestor trace explanation is planned.

Shorthand aliases:

  • gk_enrich
  • gk_validate
  • gk_plot
  • gk_download
  • gk_report
  • gk_explain

Common Workflows

Single-study enrichment:

gokit enrich \
  --study study.txt \
  --population population.txt \
  --assoc assoc.txt \
  --out results/goea

Batch enrichment + semantic similarity:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --assoc-format id2gos \
  --out results_batch \
  --out-formats tsv,jsonl \
  --compare-semantic \
  --semantic-metric wang \
  --semantic-top-k 5 \
  --semantic-namespace all \
  --semantic-min-padjsig 0.05

studies.tsv accepts either:

  • study_name<TAB>/path/to/study.txt
  • /path/to/study.txt (name inferred from filename)

Plotting Examples

Term-level and direction summary figures:

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind term-bar \
  --direction both \
  --top-n 20 \
  --out figures/study_a_terms \
  --format png

gokit plot \
  --input results_batch/all_studies.tsv \
  --study-id study_a \
  --kind direction-summary \
  --alpha 0.05 \
  --out figures/study_a_direction_summary.png

Semantic network figure from batch similarity matrix:

gokit plot \
  --input results_batch/semantic_similarity.tsv \
  --kind semantic-network \
  --min-similarity 0.25 \
  --max-edges 40 \
  --out figures/semantic_network.png

Optional auto-plot emission from enrich:

gokit enrich \
  --studies studies.tsv \
  --population population.txt \
  --assoc assoc.txt \
  --out results_batch \
  --compare-semantic \
  --emit-plots term-bar,direction-summary,semantic-network \
  --plot-format png

Example Figures

The following figures were generated from larger multi-study example tables in examples/data/realistic_plots/.

Term-bar plot (--kind term-bar, top 30 terms):

Term bar plot

Direction summary plot (--kind direction-summary):

Direction summary plot

Semantic network plot (--kind semantic-network, 8-study matrix):

Semantic network plot

Supported Analysis Controls

  • Association formats: id2gos, gaf, gpad, gene2go, auto
  • Multiple-testing methods (--method):
    • fdr_bh (default)
    • fdr_by
    • bonferroni
    • holm
    • none
  • Direction tests (--test-direction): both (default), over, under
  • Semantic metrics (--semantic-metric): jaccard, resnik, lin, wang
  • ID normalization (--id-type): auto, str, int

Download Command Equivalence

gokit download is equivalent to:

  • wget http://current.geneontology.org/ontology/go-basic.obo
  • wget http://current.geneontology.org/ontology/subsets/goslim_generic.obo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokit-0.1.5.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gokit-0.1.5-py3-none-any.whl (34.1 kB view details)

Uploaded Python 3

File details

Details for the file gokit-0.1.5.tar.gz.

File metadata

  • Download URL: gokit-0.1.5.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gokit-0.1.5.tar.gz
Algorithm Hash digest
SHA256 152c0db5a2d38a0264c285804efdcb158971c016163f5dc49d2e0bdef40189e0
MD5 3d4ef673ad404c8fd28a3ce670773a43
BLAKE2b-256 4aa27226f5f6e486e033583783ebe6b04d5d8094301279fc3b8b2b16c03c19e8

See more details on using hashes here.

File details

Details for the file gokit-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: gokit-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 34.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gokit-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7462980a6b09d5fa901756ba7a002e8d4f95b4df00f9c3323c29e81878e692fc
MD5 bbf57f7cbb185d1cb71cf253576fd90a
BLAKE2b-256 8cbdbf648961aace3253e484fe7c49fb95ec48ef5992cf861d9e9f9121bcbe98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page