A Python wrapper for the UniProt Mapping RESTful API.
Project description
UniProtMapper
A (unofficial) Python wrapper for the UniProt Retrieve/ID Mapping RESTful API. This package supports the following functionalities:
- Map UniProt IDs other identifiers (handled by UniProtIDMapper);
- Retrieve any of the supported return fields (handled by UniprotRetriever)
- Parse json UniProt-SwissProt responses (handled by SwissProtParser).
Installation
From PyPI:
python -m pip install uniprot-id-mapper
Directly from GitHub:
python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git
From source:
git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .
Usage
UniProtIDMapper
Supported databases and their respective type are stored under the attribute self.supported_dbs_with_types
. These are also found as a list under self._supported_fields
.
from UniProtMapper import UniProtIDMapper
mapper = UniProtIDMapper()
print(mapper.supported_dbs_with_types)
To map a list of UniProt IDs to Ensembl IDs, the user can either call the object directly or use the mapID
method.
result, failed = mapper.mapIDs(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
>>> Retrying in 3s
>>> Fetched: 3 / 3
result, failed = mapper(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
>>> Retrying in 3s
>>> Fetched: 3 / 3
Where result is the following pandas DataFrame:
UniProtKB_AC-ID | Ensembl | |
---|---|---|
0 | P30542 | ENSG00000163485.17 |
1 | Q16678 | ENSG00000138061.12 |
2 | Q02880 | ENSG00000077097.17 |
UniProtRetriever
This class supports retrieving any of the UniProt return fields. The user can access these directly from the object, under the attribute self.fields_table
, e.g.:
from UniProtMapper import UniProtRetriever
field_retriever = UniProtRetriever()
df = field_retriever.fields_table
df.head()
Label | Legacy Returned Field | Returned Field | Field Type | |
---|---|---|---|---|
0 | Entry | id | accession | Names & Taxonomy |
1 | Entry Name | entry name | id | Names & Taxonomy |
2 | Gene Names | genes | gene_names | Names & Taxonomy |
3 | Gene Names (primary) | genes(PREFERRED) | gene_primary | Names & Taxonomy |
4 | Gene Names (synonym) | genes(ALTERNATIVE) | gene_synonym | Names & Taxonomy |
Similar to UniProtIDMapper
, the user can either call the object directly or use the retrieveFields
method to obtain the response.
result, failed = field_retriever.retrieveFields(["Q02880"])
>>> Fetched: 1 / 1
result, failed = field_retriever(["Q02880"])
>>> Fetched: 1 / 1
Custom returned fields can be retrieved by passing a list of fields to the fields
parameter. These fields need to be within UniProtRetriever.fields_table["Returned Field"]
and will be returned with columns named as their respective Label
.
The object already has a list of default fields under self.default_fields
, but these are ignored if the parameter fields
is passed.
fields = ["accession", "organism_name", "structure_3d"]
result, failed = field_retriever.retrieveFields(["Q02880"],
fields=fields)
SwissProtParser
Querying data from UniProt-SwissProt
Retrieving json UniProt-SwissProt (reviewed) responses is also possible, such as the following:
result, failed = mapper(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="UniProtKB-Swiss-Prot"
)
print(result.loc[0, 'to'])
>>> {'from': 'P30542',
>>> 'to': {'entryType': 'UniProtKB reviewed (Swiss-Prot)',
>>> 'primaryAccession': 'P30542',
>>> ...
>>> 'Beta strand': 2,
>>> 'Turn': 1},
>>> 'uniParcId': 'UPI00000503E1'}}}
SwissProt responses from UniProtIDMapper
can be parsed using the SwissProtParser
class, where the fields to extract from UniProt (:param: = toquery) are stored under self._supported_fields
and the cross-referenced datasets are stored under self._crossref_dbs
(:param: = crossrefs).
from UniProtMapper import SwissProtParser
parser = SwissProtParser(
toquery=["organism", "tissueExpression", "cellLocation"], crossrefs=["GO"]
)
parser(result.loc[0, 'to'])
>>> {'organism': 'Homo sapiens',
>>> 'tissueExpression': '',
>>> 'cellLocation': 'Cell membrane',
>>> 'GO_crossref': ['GO:0030673~GoTerm~C:axolemma',
>>> 'GO:0030673~GoEvidenceType~IEA:Ensembl',
>>> ...
>>> 'GO:0007165~GoEvidenceType~TAS:ProtInc',
>>> 'GO:0001659~GoTerm~P:temperature homeostasis',
>>> 'GO:0001659~GoEvidenceType~IEA:Ensembl',
>>> 'GO:0070328~GoTerm~P:triglyceride homeostasis',
>>> 'GO:0070328~GoEvidenceType~IEA:Ensembl']}
Both UniProtIDMapper.mapIDs
and __call__
methods accept a SwissProtParser
as a parameter, such as in:
result, failed = mapper(
ids=["P30542", "Q16678", "Q02880"],
from_db="UniProtKB_AC-ID",
to_db="UniProtKB-Swiss-Prot",
parser=parser,
)
Mapping identifiers to orthologs
This package also allows mapping UniProt IDs to orthologs. The function uniprot_ids_to_orthologs
does that by mapping UniProt IDs to OrthoDB and then re-mapping these results to UniProt-SwissProt. Desired fields to retrieve using SwissProtParser
can be specified with the parameters uniprot_info
and crossref_dbs
.
Queried objects are in the column original_id
and their OrthoDB identifier is found on orthodb_id
.
from UniProtMapper import UniProtIDMapper
mapper = UniProtIDMapper()
result, failed = mapper.uniprotIDsToOrthologs(
ids=["P30542", "Q16678", "Q02880"], organism="Mus musculus"
)
# Fetched results contain all retrieved species.
# Filtering by organism is done on the full response.
>>> Fetched: 3 / 3
>>> Fetched: 246 / 246
Alternatively, OrthoDB IDs can be obtained using UniProtIDMapper, and used to retrieve any of the desired UniProt return fields using UniProtRetriever.
from UniProtMapper import UniProtIDMapper, UniProtRetriever
mapper = UniProtIDMapper()
result, failed = mapper(
ids=["P30542", "Q16678", "Q02880"],
from_db="UniProtKB_AC-ID",
to_db="OrthoDB",
)
field_retriever = UniProtRetriever()
ortho_results, failed = field_retriever.retrieveFields(
ids=result["to"].tolist(), from_db="OrthoDB"
)
>>> Retrying in 3s
>>> Fetched: 3 / 3
>>> Retrying in 3s
>>> Fetched: 246 / 246
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for uniprot_id_mapper-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7bb47235b50932c4b32065ea2dcd295ee618327e52173f892a8e60ee75045d0 |
|
MD5 | 9b4d6524be8f684d9cd635f6906b1327 |
|
BLAKE2b-256 | 781de7c7af99cf5ca12aee9ab3c3ac66b4d56cb120b73b3bf6f2c487fadc28af |