unofficial Python3 client API (SDK) for Genomics England (GEL) PanelApp

Project description

PanelApp_Python3_client_API

A preliminary unofficial Python3 client API (SDK) for PanelApp.

PanelApp has an OpenAPI whose specs also have json definitions.

It is a Swagger 2.0 app, which means that there would be obsolescence problems with the codegens to make a Python3 client API (an SDK). However, there is an incomplete definitions issue, done so from GEL's point of view I am guessing to avoid a circular reference problem. That or I have misunderstood what is going on! Namely in a returned Panel object (from a panel request) the keys strs, genes, regions are returned, which are arrays of objects that follow the Str, Gene and Region definitions.

Basics of the API

There are two forms of successful (200) responses. For a single entry response, the object returned is a Panel, Gene etc. while for a list of entries, the response is a typical Django API or PHP generated one, with counts, previous, next, results, where the parameter page controls which subset and is in the URL in previous/next (or None if absent).

In panel_app_query/basic.py is a barebone retriever that returns a dict or a list of dicts.

from panel_app_query import PanelAppQueryBasic

pa = PanelAppQueryBasic()
panels = pa.get_data('/panels/')
panel = pa.get_data('/panels/234/')
genes = pa.get_data('/genes/')

A panel from the list contains the keys: ['id', 'hash_id', 'name', 'disease_group', 'disease_sub_group', 'status', 'version', 'version_created', 'relevant_disorders', 'stats', 'types'] while from a single query there are additionally genes, strs, regions.

For a gene from a list the keys are: ['gene_data', 'entity_type', 'entity_name', 'confidence_level', 'penetrance', 'mode_of_pathogenicity', 'publications', 'evidence', 'phenotypes', 'mode_of_inheritance', 'tags', 'panel', 'transcript'] while gene_data dictionary contains the keys ['alias', 'biotype', 'hgnc_id', 'gene_name', 'omim_gene', 'alias_name', 'gene_symbol', 'hgnc_symbol', 'hgnc_release', 'ensembl_genes', 'hgnc_date_symbol_changed']

Note that confidence_level for a gene is a string as opposed to an integer and works like star-ratings, that is it goes from 0 (no support) to 4, and potentially 5 (not implemented as far as I can say).

Note also that for each instance of a gene in a panel there is a new gene instance (which will have the same gene data).

Dataclasses

If something more advanced is required In panel_app_query/basic.py is a retriever that returns a list of dataclass instances.

from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
panels = pa.get_data('/panels/234/', formatted=True)  # returns a list of types.Panel
# equivalent to .get_formatted_data
first_panel_gene = panels[0].genes[0]
print(first_panel_gene.entity_name)  # dot notation!
assert isinstance(first_panel_gene, pa.dataclasses['Gene'])
genes = pa.get_data('/genes/')
assert isinstance(genes[0], pa.dataclasses['Gene'])

The list of dataclasses are in the attribute .dataclasses.

The attribute swagger contains the dictionary of definitions. Derived from which is schemata, which contains the schema for each path.

The class attribute extra_fields (Dict[str, List[Tuple]] as accepted by the dataclasses.make_dataclass factory) can be (and is) used to add custom fields (in addition to the openAPI defined one) for a given dataclass name. The class attribute extra_namespaces (Dict[str, Dict[str, Callable]]) is used to assign methods to a given dataclass. See Python documentation for dataclasses for more. The latter can be used therefore to add methods to the dataclasses for extra functionality. Do note __post_init__ is not used. And the PanelAppQueryParsed method _post_init_results is called after all the results are initialised —the lists of dataclass instances aren't handed within the dataclass definitions (sloppy coding).

Pandas

from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
genes = pa.get_dataframe('/genes/')
subset = genes.loc[(genes.panel_id == 234) & (genes.confidence_level >= 3)]
# in a Jupyter notebook:
subset

Uptodateness

The data one can download from the browser for a panel may differ from that from the API. The gene list for the panel (len(subset)) above contained 54 green genes while the website listed 57! To get the web version:

from panel_app_query import PanelAppQuery
web = PanelAppQuery.retrieve_web_panel(234, '34')
print( len(web) ) # pd.DataFrame   # 57
print( len(web['Entity Name'].unique()) )  # 57

However, on further investigation the next day it was 57 for gene, but that is deceiving!

Whereas querying a panel 56 were found:

from panel_app_query import PanelAppQuery
import pandas as pd

pa = PanelAppQuery()
panels = pa.get_dataframe('/panels/234/')
confidence_levels = pd.Series(panels.genes_confidence_level[0]).astype(int)
print(sum(confidence_levels >=3))

returns 56.

However... as mentioned a gene is not a single entity.

from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
genes = pa.get_dataframe('/genes/')
subset = genes.loc[(genes.panel_id == 234) & (genes.confidence_level >= 3)]
len(subset.entity_name.unique())

returns 52 unique genes (not 57).

Whereas

from panel_app_query import PanelAppQuery
import pandas as pd

pa = PanelAppQuery()
panels = pa.get_dataframe('/panels/234/')
entity_names = pd.Series(panels.genes_entity_name[0])
confidence_levels = pd.Series(panels.genes_confidence_level[0]).astype(int)
len(entity_names[confidence_levels >=3].unique())

returns 56 (all).

The odd one out in web is 'ISCA-37432-Loss', which is a region not a gene.

So the /panels/ route is up-to-date, while /genes/ is not, but returns redundancies.

The genes that are absent cannot be explained by me.

absentees = set(web['Entity Name'].unique()) - set(subset.entity_name.unique())
web.loc[web['Entity Name'].isin(absentees)]\
    [['Entity Name', 'Entity type', 'ready', 'Flagged', 'GEL_Status', 'UserRatings_Green_amber_red' ]]\
    .sort_values('Entity Name').to_markdown()

	Entity Name	Entity type	ready	Flagged	GEL_Status	UserRatings_Green_amber_red
12	EYA1	gene	True	False	3	100;0;0
14	FRAS1	gene	True	False	3	100;0;0
15	FREM1	gene	True	False	3	100;0;0
56	ISCA-37432-Loss	region	False	False	3	0;0;0
32	LRIG2	gene	True	False	3	100;0;0

These genes do exist, but for other panels in the gene list:

absentee_subset = genes.loc[(genes.entity_name.isin(absentees))]
print(subset[['entity_name', 'panel_name', 'panel_id']].sort_values('entity_name').to_markdown())

	entity_name	panel_name	panel_id
32444	EYA1	Severe Paediatric Disorders	921
21614	EYA1	Hearing loss	126
21981	EYA1	Hearing loss	126
17354	EYA1	Fetal anomalies	478
23902	EYA1	Intellectual disability	285
10395	EYA1	Unexplained kidney failure in young people	156
24413	EYA1	Intellectual disability	285
25726	EYA1	Intellectual disability	285
19728	EYA1	DDG2P	484
19804	EYA1	DDG2P	484
7552	EYA1	Ductal plate malformation	209
27371	EYA1	Structural eye disease	509
5319	EYA1	Deafness and congenital structural abnormalities	251
28071	EYA1	Groopman et al 2019 - Genes with diagnostic variants	720
30178	EYA1	Severe Paediatric Disorders	921
10274	EYA1	Unexplained kidney failure in young people	156
....	....	....	....

Project details

Release history Release notifications | RSS feed

This version

0.1

Jul 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PanelApp_client_API-0.1.tar.gz (7.7 kB view details)

Uploaded Jul 6, 2021 Source

File details

Details for the file PanelApp_client_API-0.1.tar.gz.

File metadata

Download URL: PanelApp_client_API-0.1.tar.gz
Upload date: Jul 6, 2021
Size: 7.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6

File hashes

Hashes for PanelApp_client_API-0.1.tar.gz
Algorithm	Hash digest
SHA256	`422d61097678e5b74e4c0b7bcb6a20fcd088bbc97b02f8b322320f163c70151b`
MD5	`5f7985aed626bc6749887e13a9dabc70`
BLAKE2b-256	`c0cf65d66499acc11dae8b0f09e0b659a05dca3032b2a7cff702ad626325201f`

See more details on using hashes here.

PanelApp-client-API 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta