Skip to main content

A Python wrapper for the UniProt Mapping RESTful API.

Project description

Linting Ruff Code style: black Imports: isort License: MIT GitHub Actions Downloads:PyPI

UniProtMapper

A (unofficial) Python wrapper for the UniProt Retrieve/ID Mapping RESTful API. This package supports the following functionalities:

Installation

From PyPI:

python -m pip install uniprot-id-mapper

Directly from GitHub:

python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git

From source:

git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .

Usage

UniProtIDMapper

Supported databases and their respective type are stored under the attribute self.supported_dbs_with_types. These are also found as a list under self._supported_fields.

from UniProtMapper import UniProtIDMapper

mapper = UniProtIDMapper()
print(mapper.supported_dbs_with_types)

To map a list of UniProt IDs to Ensembl IDs, the user can either call the object directly or use the mapID method.

result, failed = mapper.mapIDs(
    ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
>>> Retrying in 3s
>>> Fetched: 3 / 3

result, failed = mapper(
    ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
>>> Retrying in 3s
>>> Fetched: 3 / 3

Where result is the following pandas DataFrame:

UniProtKB_AC-ID Ensembl
0 P30542 ENSG00000163485.17
1 Q16678 ENSG00000138061.12
2 Q02880 ENSG00000077097.17

UniProtRetriever

This class supports retrieving any of the UniProt return fields. The user can access these directly from the object, under the attribute self.fields_table, e.g.:

from UniProtMapper import UniProtRetriever

field_retriever = UniProtRetriever()
df = field_retriever.fields_table
df.head()
Label Legacy Returned Field Returned Field Field Type
0 Entry id accession Names & Taxonomy
1 Entry Name entry name id Names & Taxonomy
2 Gene Names genes gene_names Names & Taxonomy
3 Gene Names (primary) genes(PREFERRED) gene_primary Names & Taxonomy
4 Gene Names (synonym) genes(ALTERNATIVE) gene_synonym Names & Taxonomy

Similar to UniProtIDMapper, the user can either call the object directly or use the retrieveFields method to obtain the response.

result, failed = field_retriever.retrieveFields(["Q02880"])
>>> Fetched: 1 / 1

result, failed = field_retriever(["Q02880"])
>>> Fetched: 1 / 1

Custom returned fields can be retrieved by passing a list of fields to the fields parameter. These fields need to be within UniProtRetriever.fields_table["Returned Field"] and will be returned with columns named as their respective Label.

The object already has a list of default fields under self.default_fields, but these are ignored if the parameter fields is passed.

fields = ["accession", "organism_name", "structure_3d"]
result, failed = field_retriever.retrieveFields(["Q02880"],
                                                fields=fields)

SwissProtParser

Querying data from UniProt-SwissProt

Retrieving json UniProt-SwissProt (reviewed) responses is also possible, such as the following:

result, failed = mapper(
    ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="UniProtKB-Swiss-Prot"
)

print(result.loc[0, 'to'])
>>> {'from': 'P30542',
>>>  'to': {'entryType': 'UniProtKB reviewed (Swiss-Prot)',
>>>   'primaryAccession': 'P30542',
>>> ...
>>>     'Beta strand': 2,
>>>     'Turn': 1},
>>>    'uniParcId': 'UPI00000503E1'}}}

SwissProt responses from UniProtIDMapper can be parsed using the SwissProtParser class, where the fields to extract from UniProt (:param: = toquery) are stored under self._supported_fields and the cross-referenced datasets are stored under self._crossref_dbs (:param: = crossrefs).

from UniProtMapper import SwissProtParser

parser = SwissProtParser(
    toquery=["organism", "tissueExpression", "cellLocation"], crossrefs=["GO"]
)
parser(result.loc[0, 'to'])

>>> {'organism': 'Homo sapiens',
>>>  'tissueExpression': '',
>>>  'cellLocation': 'Cell membrane',
>>>  'GO_crossref': ['GO:0030673~GoTerm~C:axolemma',
>>>   'GO:0030673~GoEvidenceType~IEA:Ensembl',
>>> ...
>>>   'GO:0007165~GoEvidenceType~TAS:ProtInc',
>>>   'GO:0001659~GoTerm~P:temperature homeostasis',
>>>   'GO:0001659~GoEvidenceType~IEA:Ensembl',
>>>   'GO:0070328~GoTerm~P:triglyceride homeostasis',
>>>   'GO:0070328~GoEvidenceType~IEA:Ensembl']}

Both UniProtIDMapper.mapIDs and __call__ methods accept a SwissProtParser as a parameter, such as in:

result, failed = mapper(
    ids=["P30542", "Q16678", "Q02880"],
    from_db="UniProtKB_AC-ID",
    to_db="UniProtKB-Swiss-Prot",
    parser=parser,
)

Mapping identifiers to orthologs

This package also allows mapping UniProt IDs to orthologs. The function uniprot_ids_to_orthologs does that by mapping UniProt IDs to OrthoDB and then re-mapping these results to UniProt-SwissProt. Desired fields to retrieve using SwissProtParser can be specified with the parameters uniprot_info and crossref_dbs.

Queried objects are in the column original_id and their OrthoDB identifier is found on orthodb_id.

from UniProtMapper import UniProtIDMapper

mapper = UniProtIDMapper()
result, failed = mapper.uniprotIDsToOrthologs(
    ids=["P30542", "Q16678", "Q02880"], organism="Mus musculus"
)

# Fetched results contain all retrieved species.
# Filtering by organism is done on the full response.
>>> Fetched: 3 / 3
>>> Fetched: 246 / 246

Alternatively, OrthoDB IDs can be obtained using UniProtIDMapper, and used to retrieve any of the desired UniProt return fields using UniProtRetriever.

from UniProtMapper import UniProtIDMapper, UniProtRetriever

mapper = UniProtIDMapper()
result, failed = mapper(
    ids=["P30542", "Q16678", "Q02880"],
    from_db="UniProtKB_AC-ID",
    to_db="OrthoDB",
)
field_retriever = UniProtRetriever()
ortho_results, failed = field_retriever.retrieveFields(
    ids=result["to"].tolist(), from_db="OrthoDB"
)

>>> Retrying in 3s
>>> Fetched: 3 / 3
>>> Retrying in 3s
>>> Fetched: 246 / 246

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniprot-id-mapper-1.0.4.tar.gz (40.5 kB view hashes)

Uploaded Source

Built Distribution

uniprot_id_mapper-1.0.4-py3-none-any.whl (39.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page