unofficial Python3 client API (SDK) for Genomics England (GEL) PanelApp
Project description
PanelApp_Python3_client_API
A preliminary unofficial Python3 client API (SDK) for PanelApp.
PanelApp has an OpenAPI whose specs also have json definitions.
It is a Swagger 2.0 app, which means that there would be obsolescence problems
with the codegens to make a Python3 client API (an SDK).
However, there is an incomplete definitions issue,
done so from GEL's point of view I am guessing to avoid a circular reference problem.
That or I have misunderstood what is going on!
Namely in a returned Panel object (from a panel request) the keys strs
, genes
, regions
are returned,
which are arrays of objects that follow the Str
, Gene
and Region
definitions.
Basics of the API
There are two forms of successful (200) responses.
For a single entry response, the object returned is a Panel
, Gene
etc. while for a list of entries,
the response is a typical Django API or PHP generated one, with counts
, previous
, next
, results
,
where the parameter page
controls which subset and is in the URL in previous
/next
(or None if absent).
In panel_app_query/basic.py is a barebone retriever that returns a dict or a list of dicts.
from panel_app_query import PanelAppQueryBasic
pa = PanelAppQueryBasic()
panels = pa.get_data('/panels/')
panel = pa.get_data('/panels/234/')
genes = pa.get_data('/genes/')
A panel from the list contains the keys:
['id', 'hash_id', 'name', 'disease_group', 'disease_sub_group', 'status', 'version', 'version_created', 'relevant_disorders', 'stats', 'types']
while from a single query there are additionally genes
, strs
, regions
.
For a gene from a list the keys are:
['gene_data', 'entity_type', 'entity_name', 'confidence_level', 'penetrance', 'mode_of_pathogenicity', 'publications', 'evidence', 'phenotypes', 'mode_of_inheritance', 'tags', 'panel', 'transcript']
while gene_data
dictionary contains the
keys ['alias', 'biotype', 'hgnc_id', 'gene_name', 'omim_gene', 'alias_name', 'gene_symbol', 'hgnc_symbol', 'hgnc_release', 'ensembl_genes', 'hgnc_date_symbol_changed']
Note that confidence_level
for a gene is a string as opposed to an integer and
works like star-ratings, that is it goes from 0 (no support) to 4, and potentially 5 (not implemented as far as I can say).
Note also that for each instance of a gene in a panel there is a new gene instance (which will have the same gene data).
Dataclasses
If something more advanced is required In panel_app_query/basic.py is a retriever that returns a list of dataclass instances.
from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
panels = pa.get_data('/panels/234/', formatted=True) # returns a list of types.Panel
# equivalent to .get_formatted_data
first_panel_gene = panels[0].genes[0]
print(first_panel_gene.entity_name) # dot notation!
assert isinstance(first_panel_gene, pa.dataclasses['Gene'])
genes = pa.get_data('/genes/')
assert isinstance(genes[0], pa.dataclasses['Gene'])
The list of dataclasses are in the attribute .dataclasses
.
The attribute swagger
contains the dictionary of definitions.
Derived from which is schemata
, which contains the schema for each path.
The class attribute extra_fields
(Dict[str, List[Tuple]]
as accepted by the dataclasses.make_dataclass
factory)
can be (and is) used to add custom fields (in addition to the openAPI defined one) for a given dataclass name.
The class attribute extra_namespaces
(Dict[str, Dict[str, Callable]]
) is used to assign methods to a given dataclass.
See Python documentation for dataclasses for more.
The latter can be used therefore to add methods to the dataclasses for extra functionality.
Do note __post_init__
is not used. And the PanelAppQueryParsed method _post_init_results
is called after
all the results are initialised —the lists of dataclass instances aren't handed
within the dataclass definitions (sloppy coding).
Pandas
from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
genes = pa.get_dataframe('/genes/')
subset = genes.loc[(genes.panel_id == 234) & (genes.confidence_level >= 3)]
# in a Jupyter notebook:
subset
Uptodateness
The data one can download from the browser for a panel may differ from that from the API.
The gene list for the panel (len(subset)
) above contained 54 green genes while the website listed 57!
To get the web version:
from panel_app_query import PanelAppQuery
web = PanelAppQuery.retrieve_web_panel(234, '34')
print( len(web) ) # pd.DataFrame # 57
print( len(web['Entity Name'].unique()) ) # 57
However, on further investigation the next day it was 57 for gene, but that is deceiving!
Whereas querying a panel 56 were found:
from panel_app_query import PanelAppQuery
import pandas as pd
pa = PanelAppQuery()
panels = pa.get_dataframe('/panels/234/')
confidence_levels = pd.Series(panels.genes_confidence_level[0]).astype(int)
print(sum(confidence_levels >=3))
returns 56.
However... as mentioned a gene is not a single entity.
from panel_app_query import PanelAppQuery
pa = PanelAppQuery()
genes = pa.get_dataframe('/genes/')
subset = genes.loc[(genes.panel_id == 234) & (genes.confidence_level >= 3)]
len(subset.entity_name.unique())
returns 52 unique genes (not 57).
Whereas
from panel_app_query import PanelAppQuery
import pandas as pd
pa = PanelAppQuery()
panels = pa.get_dataframe('/panels/234/')
entity_names = pd.Series(panels.genes_entity_name[0])
confidence_levels = pd.Series(panels.genes_confidence_level[0]).astype(int)
len(entity_names[confidence_levels >=3].unique())
returns 56 (all).
The odd one out in web is 'ISCA-37432-Loss', which is a region not a gene.
So the /panels/
route is up-to-date, while /genes/
is not, but returns redundancies.
The genes that are absent cannot be explained by me.
absentees = set(web['Entity Name'].unique()) - set(subset.entity_name.unique())
web.loc[web['Entity Name'].isin(absentees)]\
[['Entity Name', 'Entity type', 'ready', 'Flagged', 'GEL_Status', 'UserRatings_Green_amber_red' ]]\
.sort_values('Entity Name').to_markdown()
Entity Name | Entity type | ready | Flagged | GEL_Status | UserRatings_Green_amber_red | |
---|---|---|---|---|---|---|
12 | EYA1 | gene | True | False | 3 | 100;0;0 |
14 | FRAS1 | gene | True | False | 3 | 100;0;0 |
15 | FREM1 | gene | True | False | 3 | 100;0;0 |
56 | ISCA-37432-Loss | region | False | False | 3 | 0;0;0 |
32 | LRIG2 | gene | True | False | 3 | 100;0;0 |
These genes do exist, but for other panels in the gene list:
absentee_subset = genes.loc[(genes.entity_name.isin(absentees))]
print(subset[['entity_name', 'panel_name', 'panel_id']].sort_values('entity_name').to_markdown())
entity_name | panel_name | panel_id | |
---|---|---|---|
32444 | EYA1 | Severe Paediatric Disorders | 921 |
21614 | EYA1 | Hearing loss | 126 |
21981 | EYA1 | Hearing loss | 126 |
17354 | EYA1 | Fetal anomalies | 478 |
23902 | EYA1 | Intellectual disability | 285 |
10395 | EYA1 | Unexplained kidney failure in young people | 156 |
24413 | EYA1 | Intellectual disability | 285 |
25726 | EYA1 | Intellectual disability | 285 |
19728 | EYA1 | DDG2P | 484 |
19804 | EYA1 | DDG2P | 484 |
7552 | EYA1 | Ductal plate malformation | 209 |
27371 | EYA1 | Structural eye disease | 509 |
5319 | EYA1 | Deafness and congenital structural abnormalities | 251 |
28071 | EYA1 | Groopman et al 2019 - Genes with diagnostic variants | 720 |
30178 | EYA1 | Severe Paediatric Disorders | 921 |
10274 | EYA1 | Unexplained kidney failure in young people | 156 |
.... | .... | .... | .... |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file PanelApp_client_API-0.1.tar.gz
.
File metadata
- Download URL: PanelApp_client_API-0.1.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 422d61097678e5b74e4c0b7bcb6a20fcd088bbc97b02f8b322320f163c70151b |
|
MD5 | 5f7985aed626bc6749887e13a9dabc70 |
|
BLAKE2b-256 | c0cf65d66499acc11dae8b0f09e0b659a05dca3032b2a7cff702ad626325201f |