Skip to main content

GenomeHubs

Project description

About

GenomeHubs comprises a set of tools to parse index and search and display genomic metadata, assembly features and sequencing status for projects under the Earth BioGenome Project umbrella that aim to sequence all described eukaryotic species over a period of 10 years.

Genomehubs builds on legacy code that supported taxon-oriented databases of butterflies & moths (lepbase.org), molluscs (molluscdb.org), mealybugs (mealybug.org) and more. Genomehubs is now search-oriented and positioned to scale to the challenges of mining data across almost 2 million species.

The first output from the new search-oriented GenomeHubs is Genomes on a Tree (GoaT, goat.genomehubs.org), which has been opublised in: Challis et al. 2023, Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Research, 8:24 doi:10.12688/wellcomeopenres.18658.1

The goat.genomehubs.org website is freely available with no logins or restrictions, and is being widely used by the academic community and especially by the Earth BioGenome Project to plan and coordinate efforts to sequence all described eukaryotic species.

The core GoaT/Genomehubs components are available as a set of Docker containers:

GoaT UI Docker image

A bundled web server to run a GoaT-specific instance of the GenomeHubs UI, as used at goat.genomehubs.org.

Usage

docker pull genomehubs/goat:latest

docker run -d --restart always \
    --net net-es -p 8880:8880 \
    --user $UID:$GROUPS \
    -e GH_CLIENT_PORT=8880 \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_SUGGESTED_TERM=Canidae \
    --name goat-ui \
    genomehubs/goat:latest

Genomehubs UI Docker image

A bundled web server to run an instance of the GenomeHubs UI, such as goat.genomehubs.org.

Usage

docker pull genomehubs/genomehubs-ui:latest

docker run -d --restart always \
    --net net-es -p 8880:8880 \
    --user $UID:$GROUPS \
    -e GH_CLIENT_PORT=8880 \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_SUGGESTED_TERM=Canidae \
    --name gh-ui \
    genomehubs/genomehubs-ui:latest

Genomehubs API Docker image

A bundled web server to run an instance of the GenomeHubs API. The GenomeHubs API underpins all search functionality for Genomes on a Tree (GoaT) goat.genomehubs.org. OpenAPI documentation for the GenomeHubs API instance used by GoaT is available at goat.genomehubs.org/api-docs.

Usage

docker pull genomehubs/genomehubs-api:latest

docker run -d \
    --restart always \
    --net net-es -p 3000:3000 \
    --user $UID:$GROUPS \
    -e GH_ORIGINS="https://goat.genomehubs.org null" \
    -e GH_HUBNAME=goat \
    -e GH_HUBPATH="/genomehubs/resources/" \
    -e GH_NODE="http://es1:9200" \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_RELEASE=$RELEASE \
    -e GH_SOURCE=https://github.com/genomehubs/goat-data \
    -e GH_ACCESS_LOG=/genomehubs/logs/access.log \
    -e GH_ERROR_LOG=/genomehubs/logs/error.log \
    -v /volumes/docker/logs/$RELEASE:/genomehubs/logs \
    -v /volumes/docker/resources:/genomehubs/resources \
    --name goat-api \
genomehubs/genomehubs-api:latest;

Genomehubs CLI Docker image

command line tool to process and index genomic metadata for GenomeHubs. Used to build and update GenomeHubs instances such as Genomes on a Tree goat.genomehubs.org.

Usage

docker pull genomehubs/genomehubs:latest

Parse [NCBI datasets](https://www.ncbi.nlm.nih.gov/datasets/) genome assembly metadata:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs parse \
            --ncbi-datasets-genome sources/assembly-data \
            --outfile sources/assembly-data/ncbi_datasets_eukaryota.tsv.gz"

Initialise a set of ElasticSearch indexes with [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy/) data for all eukaryotes:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs init \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --taxonomy-jsonl sources/ena-taxonomy/ena-taxonomy.extra.jsonl.gz \
            --taxonomy-ncbi-root 2759 \
            --taxon-preload"

Index assembly metadata:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs index \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --assembly-dir sources/assembly-data"

Fill taxon attribute values across the tree of life:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs fill \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --traverse-root 2759 \
            --traverse-infer-both"

Changelog

2.0.0 (2020-07-02)

  • First release on PyPI.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomehubs-2.8.0.tar.gz (117.9 kB view details)

Uploaded Source

Built Distribution

genomehubs-2.8.0-py3-none-manylinux2014_x86_64.whl (146.0 kB view details)

Uploaded Python 3

File details

Details for the file genomehubs-2.8.0.tar.gz.

File metadata

  • Download URL: genomehubs-2.8.0.tar.gz
  • Upload date:
  • Size: 117.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for genomehubs-2.8.0.tar.gz
Algorithm Hash digest
SHA256 28c518dda3e05ff8256dc60a441c987545431d450828ca17d8c68b9367dd2e0e
MD5 0ac787db702dbdcff7c0fa5b440ad294
BLAKE2b-256 e1ae20a1850e2c19f9c81520a13a8d39a35efbf1e5f6ea7b0a93ef3268d39026

See more details on using hashes here.

File details

Details for the file genomehubs-2.8.0-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for genomehubs-2.8.0-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d281eba12e587157ba7c1e4d8800246dda3371003bc4d41133dd22a8fb5d6bc8
MD5 d98843b1c32f364a0b8db2036aa0a3eb
BLAKE2b-256 0ae44623d86c833413ed86e8ea671b0101399cce98ff5f32f1d035b50d637e88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page