Skip to main content

Compare gene ranking methods.

Project description

CI Pypi Package Version Pypi Python Version

Gene Ranking Shootout

A benchmark for methods that rank genes according to their relevance for a given phenotype (list of HPO terms).

Methods

The following methods are currently included in the benchmark:

  • AMELIE (via web service)
  • CADA (via custom Docker/Podman image)
  • Phen2Gene (via official Docker/Podman image)
  • Phenix algorithm (as implemented in VarFish)
  • Exomiser Algorithms:
    • Phenix
    • Phive
    • HiPhive (variants: human only, human-mouse only, all human-mouse-fish-ppi)

Installation

Simply install with pip (probably inside a conda environment or virtualenv):

$ git clone https://github.com/bihealth/gene-ranking-shootout.git
$ cd gene-ranking-shootout
$ pip install -e .
$ gene-ranking-shootout --help

Usage

List datasets in the benchmark:

$ gene-ranking-shootout dataset list
cada_clinvar_cases
cada_cases_test
cada_cases_validate
cada_all_cases
cada_collaborator_cases
cada_cases_train

Show the first entry in a dataset:

$ gene-ranking-shootout dataset head cada_cases_test --count 2
{"name": "Patient:SCV000281758", "disease_omim_id": "OMIM:617360", "disease_gene_id": "Entrez:8621", "hpo_terms": ["HP:0001508"], "candidate_gene_ids": null}
{"name": "Patient:SCV000864231", "disease_omim_id": "OMIM:132900", "disease_gene_id": "Entrez:4629", "hpo_terms": ["HP:0011499", "HP:0000021"], "candidate_gene_ids": null}

Simulate cases based on one or more datasets. This will pick a number of cases from the datasets. Further, it will pick another number of random genes based on the number of rare variants (freq below 0.1% in gnomAD genomes). The results are written into a JSON file with the cases. The simulation is randomized with a fixed seed that can be adjusted on the command line if necessary.

$ gene-ranking-shootout dataset simulate \
    /tmp/cases.json \
    $(gene-ranking-shootout dataset list) \
    --case-count 4714 \
    --candidate-genes-count 100
2023-05-05 11:16:35 | INFO   | Loading data
2023-05-05 11:16:35 | INFO   | ... 4714 cases overall (9428 duplicates)
2023-05-05 11:16:35 | INFO   | Simulating cases
2023-05-05 11:16:37 | INFO   | Wrote 4714 cases

You can then run the benchmark on the cases with the different methods:

$ gene-ranking-shootout benchmark
Usage: gene-ranking-shootout benchmark [OPTIONS] COMMAND [ARGS]...

  Group for benchmark sub commands.

Options:
  --help  Show this message and exit.

Commands:
  amelie          Benchmark the AMELIE web server.
  phen2gene       Benchmark the Phen2Gene container.
  summarize       Summarize the results.
  varfish-phenix  Benchmark the VarFish implementation of the Phenix...
$ gene-ranking-shootout benchmark amelie /tmp/cases.json /tmp/result-amelie.json
$ gene-ranking-shootout benchmark phen2gene /tmp/cases.json /tmp/result-phen2gene.json
$ gene-ranking-shootout benchmark varfish-phenix http://127.0.0.1:8081/hpo/sim/term-gene /tmp/cases.json /tmp/result-varfish-phenix.json
$ gene-ranking-shootout benchmark cada /tmp/cases.json /tmp/result-cada.json
$ gene-ranking-shootout benchmark exomiser http://localhost:8081/ phenix /tmp/cases.json /tmp/result-exomiser-phenix.json
$ gene-ranking-shootout benchmark exomiser http://localhost:8081/ phive /tmp/cases.json /tmp/result-exomiser-phive.json
$ gene-ranking-shootout benchmark exomiser http://localhost:8081/ hiphive /tmp/cases.json /tmp/result-exomiser-hiphive.json
$ gene-ranking-shootout benchmark exomiser http://localhost:8081/ hiphive-mouse /tmp/cases.json /tmp/result-exomiser-hiphive-mouse.json
$ gene-ranking-shootout benchmark exomiser http://localhost:8081/ hiphive-human /tmp/cases.json /tmp/result-exomiser-hiphive-human.json

You can also visualize the details of the benchmark results for each result file (below for 100 cases). This visualization displays the number of true disease genes (from case set definitions) at TOP10 and following positions in the ranked gene list of the respective method.

$ gene-ranking-shootout benchmark summarize /tmp/result-amelie.json
    1:   48  ################################
    2:   17  ###########
    3:    7  ####
    4:    3  ##
    5:    6  ####
    6:    3  ##
    7:    2  #
    8:    3  ##
    9:    1  .
   10:    0  

11-..:    9  ######
mssng:    0  

Building CADA Podman Image

There is no public REST API or docker image for CADA (yet). Here is how to build the needed CADA Podman image:

# cd docker/cada
# bash build.sh
...
# podman run -it --rm localhost/cada-for-shootout:latest --help

Running Phenix in VarFish

Send an email to the author to get a copy of the necessary data. Then, run the following in the background.

$ varfish-server-worker server pheno --path-hpo-dir path/to/varfish-server-worker-db/hpo

Running Exomiser

The following are more rough notes than a full manual. This uses the current Exomiser RES API version 13.2.0 (current at: 2023-05-05). You will need approximately 75GB of storage for download and extraction and afterwards 49GB. Probably one could get rid of a lot of the variant-specific data but I did not go into detail here.

$ wget https://github.com/exomiser/Exomiser/releases/download/13.2.0/exomiser-rest-prioritiser-13.2.0.jar
$ wget https://data.monarchinitiative.org/exomiser/latest/2302_phenotype.zip
$ wget https://data.monarchinitiative.org/exomiser/latest/2302_hg19.zip
$ unzip 2302_phenotype.zip
$ unzip 2302_hg19.zip
$ cat <<EOF > application.properties
exomiser.data-directory=$PWD
exomiser.hg19.data-version=1909
exomiser.phenotype.data-version=2302
exomiser.phenotype.random-walk-file-name=rw_string_10.mv
EOF
$ java -Xmx6G -Xms2G -Dserver.address=0.0.0.0 -Dserver.port=8081 -jar exomiser-rest-prioritiser-13.2.0.jar

Datasets

The following datasets are included at the moment:

  • cada_cases_test.json - converte from CADA's cases_test.tsv
  • cada_cases_train.json - converte from CADA's cases_train.tsv
  • cada_cases_validate.json - converte from CADA's cases_validate.tsv
  • cada_clinvar_cases.json - converte from CADA's clinvar_cases.tsv
  • cada_collaborator_cases.json - converte from CADA's collaborator_cases.tsv

You can conver TSV files with the following structure with gene-ranking-shootout dataset convert-tsv.

  • Column 1: name for the case; must start with Patient: or is ignored.
  • Column 2: disease_omim_id; as OMIM:123456 or unknown
  • Column 3: disease_gene_id; as Entrez:123456
  • Column 4: hpo_terms; as semicolon-separated list, e.g., HP:0001234;HP:0005678

If a row has less than 4 columns, we assume that column 2 is missing. All further columns are ignored. The file should not have a header. You can find some files in the CADA repository here:

The call to gene-ranking-shootout dataset convert-tsv should be as follows.

$ gene-ranking-shootout dataset convert-tsv input.tsv output.json

Some Preliminary Results

The following was generated on 2023/05/05 with all 4714 cases.

$ for f in /tmp/result-*.json; do (set -x; gene-ranking-shootout benchmark summarize --bars-top-n 20 $f); echo; done
+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-cada.json
    1: 3462  ################################################
    2:  536  #######
    3:  205  ##
    4:  133  #
    5:   71  .
    6:   57  .
    7:   39  .
    8:   47  .
    9:   16  .
   10:   24  .
   11:   14  .
   12:   22  .
   13:   14  .
   14:   11  .
   15:    7  .
   16:    9  .
   17:    8  .
   18:    4  .
   19:    3  .
   20:    3  .

21-..:   29  .
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-exomiser-hiphive-human.json
    1: 2593  ####################################
    2:  637  ########
    3:  375  #####
    4:  203  ##
    5:  132  #
    6:  104  #
    7:   99  #
    8:   80  #
    9:   61  .
   10:   46  .
   11:   44  .
   12:   30  .
   13:   30  .
   14:   30  .
   15:   15  .
   16:   22  .
   17:   12  .
   18:   11  .
   19:   13  .
   20:   10  .

21-..:  149  ##
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-exomiser-hiphive.json
    1: 2418  #################################
    2:  686  #########
    3:  355  ####
    4:  226  ###
    5:  155  ##
    6:  122  #
    7:   86  #
    8:   95  #
    9:   54  .
   10:   54  .
   11:   52  .
   12:   28  .
   13:   29  .
   14:   27  .
   15:   16  .
   16:   11  .
   17:   23  .
   18:    9  .
   19:   10  .
   20:   14  .

21-..:  226  ###
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-exomiser-hiphive-mouse.json
    1: 2418  #################################
    2:  685  #########
    3:  357  #####
    4:  227  ###
    5:  160  ##
    6:  121  #
    7:   90  #
    8:   96  #
    9:   55  .
   10:   56  .
   11:   58  .
   12:   37  .
   13:   31  .
   14:   26  .
   15:   22  .
   16:   18  .
   17:   26  .
   18:   19  .
   19:   14  .
   20:   15  .

21-..:  165  ##
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-exomiser-phenix.json
    1: 2566  ####################################
    2:  619  ########
    3:  306  ####
    4:  208  ##
    5:  152  ##
    6:  119  #
    7:   90  #
    8:   80  #
    9:   73  #
   10:   64  .
   11:   49  .
   12:   41  .
   13:   40  .
   14:   26  .
   15:   29  .
   16:   27  .
   17:   15  .
   18:   17  .
   19:   10  .
   20:   16  .

21-..:  149  ##
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-exomiser-phive.json
    1:  934  #############
    2:  298  ####
    3:  163  ##
    4:  101  #
    5:   50  .
    6:   44  .
    7:   30  .
    8:   33  .
    9:   16  .
   10:   16  .
   11:   10  .
   12:    9  .
   13:   17  .
   14:   13  .
   15:   12  .
   16:   16  .
   17:   14  .
   18:   23  .
   19:   28  .
   20:   33  .

21-..: 2836  #######################################
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-phen2gene.json
    1: 2426  ##################################
    2:  470  ######
    3:  209  ##
    4:  125  #
    5:  101  #
    6:   67  .
    7:   51  .
    8:   62  .
    9:   53  .
   10:   41  .
   11:   33  .
   12:   37  .
   13:   42  .
   14:   33  .
   15:   34  .
   16:   28  .
   17:   18  .
   18:   29  .
   19:   17  .
   20:   19  .

21-..:  763  ##########
mssng:    0  

+ gene-ranking-shootout benchmark summarize --bars-top-n 20 result-varfish-phenix.json
    1: 1709  #######################
    2:  616  ########
    3:  357  ####
    4:  277  ###
    5:  184  ##
    6:  152  ##
    7:  131  #
    8:  118  #
    9:  105  #
   10:   78  #
   11:   71  .
   12:   57  .
   13:   67  .
   14:   64  .
   15:   67  .
   16:   71  .
   17:   56  .
   18:   48  .
   19:   34  .
   20:   48  .

21-..:  403  #####
mssng:    0  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gene-ranking-shootout-0.1.3.tar.gz (810.5 kB view details)

Uploaded Source

File details

Details for the file gene-ranking-shootout-0.1.3.tar.gz.

File metadata

  • Download URL: gene-ranking-shootout-0.1.3.tar.gz
  • Upload date:
  • Size: 810.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for gene-ranking-shootout-0.1.3.tar.gz
Algorithm Hash digest
SHA256 103929e83cdee001d3589e9cc259417fa536ce181f87bedc5e0ffd305934fa41
MD5 402a0a74adb998b6fdbe3366365d275a
BLAKE2b-256 c4b8a7700d4e76fa8101718c57dba7ab9043ca08705ab5949fa92966d761b084

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page