Skip to main content

Genomics England Bioinformatics team model definitions

Project description


This project contains several models used by Genomics England systems. This is a development guide, the models documentation can be found at

These models are defined using Avro Interface Design Language (IDL) which is agnostic of any implementation language. The models are then used to generate source code employed to store the information. The source code is either Python or Java so far, but this can be easily extended.

From the Avro models you can generate:

  • Java source code
  • Python source code
  • JSON schemas
  • AVPR schemas
  • HTML documentation

Maven is employed for Java dependency management. Particularly, OpenCB models are imported through a maven dependency and then extracted into the local folder for schemas. This is not required unless you upgrade OpenCB models version, as the OpenCB models are commited in the repository under schemas/IDLs/org.opencb.biodata.models.

To import the OpenCB dependency and extract the models in your local environment run:

% mvn clean initialize


Models are organised in packages:

  • Internal
    • org.gel.models.participant.avro
    • org.gel.models.metrics.avro
    • org.gel.models.cva.avro
    • org.gel.models.system.avro
  • External
    • org.ga4gh.models
    • org.opencb.biodata.models

A package is formed by a set of schema files having .avdl extension.

There are dependencies between packages that require that we build packages together.

Each of those packages support independent versioning. Also there are build versions that determine a set of specific packages that are built together. These information is contained within builds.json in an array of build descriptions.

The following represents the build version 4.3.0-SNAPSHOT having package org.ga4gh.models version 3.0.0, package org.gel.models.cva.avro version 0.4.0-SNAPSHOT and so on.

  "version": "4.3.0-SNAPSHOT",
  "packages": [
      "package": "org.ga4gh.models",
      "python_package": "ga4gh",
      "version": "3.0.0",
      "dependencies": []
      "package": "org.gel.models.cva.avro",
      "python_package": "cva",
      "version": "0.4.0-SNAPSHOT",
      "dependencies": [
      "package": "org.gel.models.metrics.avro",
      "python_package": "metrics",
      "version": "1.1.0-SNAPSHOT",
      "dependencies": []
      "package": "org.gel.models.participant.avro",
      "python_package": "participant",
      "version": "1.0.4-SNAPSHOT",
      "dependencies": []
      "package": "",
      "python_package": "reports",
      "version": "4.2.0-SNAPSHOT",
      "dependencies": [
      "package": "org.gel.models.system.avro",
      "python_package": "system",
      "version": "0.1.0-SNAPSHOT",
      "dependencies": []
      "package": "org.opencb.biodata.models",
      "python_package": "opencb",
      "version": "1.3.0-SNAPSHOT",
      "dependencies": []

Every package in a build is built in a sandbox folder under schemas/IDLs/build together with those packages in the list of dependencies for each package. This introduces two strong constraints in the models:

  • The same package cannot contain two records named equally in different schema files
  • Schema files in different packages cannot be named equally

The build sandbox is deleted after every build.

Getting started

Install requirements

Install sphynx:

sudo apt-get install python-sphinx

Install avrodoc:

sudo apt-get install nodejs nodejs-legacy
sudo apt-get install npm
sudo npm install avrodoc -g

Install python dependencies:

pip install -r requirements.txt

Build the models

To build all builds described in builds.json run:

% python

This will create the following:

  • Python source code representing the Avro records in the folder ./protocols/models
  • Java source code representing the Avro records in the folder ./target/generated-sources/avro
  • The models HTML documentation under ./docs/html_schemas

It may be handy to skip the documentation generation by using the flag --skip-docs.

Building legacy versions of the models

See builds.json for the information on all legacy versions and the specific package versions and dependencies in each of those.

To build a specific version run:

% python --version 3.0.0

Using custom tools to build the models

To facilitate using custom tools to build the models you can prepare the sandbox for a specific version running:

% python --version 4.0.0 --only-prepare-sandbox

This will copy all required schemas for that build under the folder schemas/IDLs/build.

Other build options

Use --skip-docs to avoid generating documentation which affects build time greatly.

Use --skip-java to avoid generating Java source code.

Use --update-docs-index to update the documentation landing page with the latest documentation generated.

Java Packaging

To pack the Java source code representing these models in a jar file use:

% mvn package

To install it in your system so it is accessible as a maven dependency to other Java applications run:

% mvn install

Unit tests

To run some unit tests implemented in Python run:

% ./

Mock data

Generate a mocked object with custom fields as follows:

from protocols.util.dependency_manager import VERSION_500
from protocols.util.factories.avro_factory import GenericFactoryAvro

interpretation_request_factory = GenericFactoryAvro.get_factory_avro(
    version = VERSION_500
instance = interpretation_request_factory(analysisReturnUri = "myURI")
self.assertTrue(instance.analysisReturnUri == "myURI")



Building Resources From a Container

From your starting directory, eg. ~/gel/:

Clone this repo

Then run the following (you may need sudo depending on your system configuration):


Once the build is successful, check the resources are there:

root@e444d27c16b9:/gel# ls GelReportModels/protocols/
GelProtocols.pyc      migration                         protocol.pyc

Also check that Java resources are there:

root@4dabae77118d:/gel# ls -l GelReportModels/target/
total 53620
drwxr-xr-x  2 root root     4096 Aug 18 08:35 antrun
drwxr-xr-x  3 root root     4096 Aug 18 08:45 classes
drwxr-xr-x  2 root root     4096 Aug 18 08:35 dependency-maven-plugin-markers
drwxr-xr-x 17 root root     4096 Aug 18 08:55 gel-models-4.3.0-SNAPSHOT
-rw-r--r--  1 root root  1819234 Aug 18 08:45 gel-models-4.3.0-SNAPSHOT.jar
-rw-r--r--  1 root root 53054906 Aug 18 08:55 gel-models-docs-4.3.0-SNAPSHOT.war
drwxr-xr-x  4 root root     4096 Aug 18 08:45 generated-sources
drwxr-xr-x  2 root root     4096 Aug 18 08:45 maven-archiver
drwxr-xr-x  3 root root     4096 Aug 18 08:45 maven-status

then in a separate tab/window, from the GelReportModels directory:

$ sudo docker ps -alq

and use this container ID to copy the python files from GelReportModels/protocols:

$ sudo docker cp containerID:/GelReportModels/protocols .

then check you have them present:

$ ls ./protocols/
catalog_variable_set  GelProtocols.pyc  migration         protocol.pyc                __init__.pyc             

Additional tools

The conversion between the different Avro schema formats, source code and documentation are available through the following utility:

$ cd resources/GelModelsTools/
$ python --help
usage: <command> [<args>]

GEL models toolbox

positional arguments:
  command     Subcommand to run

optional arguments:
  -h, --help  show this help message and exit


To deploy to GELs internal pypi instance, run the GEL-models/Deploy GelReportModels into Pypi bio jenkins job and this will automate the deploy.

To deploy to public PyPi you can use one of the Dockerfiles in this repo. Create an image and run it as follows:

docker build -f Dockerfile-python3 .
docker run -it <hashname> /bin/bash

Once inside the container you need to create a file called ~/.pypirc with contents as follows:

index-servers =

username: <your username>
password: <your password>

Once you have this file, you can run the following commands:

python3 --skip-java --skip-docs

pip3 install --upgrade twine wheel setuptools keyrings.alt
python3 sdist bdist_wheel

twine upload dist/GelReportModels-7.3.6.tar.gz


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for GelReportModels, version 7.8.0
Filename, size File type Python version Upload date Hashes
Filename, size GelReportModels-7.8.0-py3-none-any.whl (2.4 MB) File type Wheel Python version py3 Upload date Hashes View
Filename, size GelReportModels-7.8.0.tar.gz (2.2 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page