Skip to main content

A Python wrapper for the UniProt Mapping RESTful API.

Project description

License: MIT Ruff Code style: black Imports: isort GitHub Actions Downloads:PyPI

UniProtMapper

Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt's Retrieve & ID Mapping RESTful APIs. Read the full documentation.

📚 Table of Contents

⛏️ Features

UniProtMapper is a tool for bioinformatics and proteomics research that supports:

  1. Mapping any UniProt cross-referenced IDs to other identifiers & vice-versa;
  2. Programmatically retrieving any of the supported return and cross-reference fields from both UniProt-SwissProt and UniProt-TrEMBL (unreviewed) databases. For a full table containing all the supported resources, refer to the supported fields in the docs;
  3. Querying UniProtKB entries using complex field-based queries with boolean operators ~ (NOT), | (OR), & (AND).

For the first two functionalities, check the examples Mapping IDs and Retrieving Information below. The third, see Field-based Querying.

The ID mapping API can also be accessed through the CLI. For more information, check CLI.

📦 Installation

From PyPI (recommended):

python -m pip install uniprot-id-mapper

Directly from GitHub:

python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git

From source:

git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .

🛠️ Usage

Mapping IDs

Use UniProtMapper to easily map between different protein identifiers:

from UniProtMapper import ProtMapper

mapper = ProtMapper()

result, failed = mapper.get(
    ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)

The result is a pandas DataFrame containing the mapped IDs (see below), while failed is a list of identifiers that couldn't be mapped.

UniProtKB_AC-ID Ensembl
0 P30542 ENSG00000163485.17
1 Q16678 ENSG00000138061.12
2 Q02880 ENSG00000077097.17

Retrieving Information

A DataFrame with the supported return fields is accessible through the attribute ProtMapper.fields_table:

from UniProtMapper import ProtMapper

mapper = ProtMapper()
df = mapper.fields_table
df.head()
label returned_field field_type has_full_version type
0 Entry accession Names & Taxonomy - uniprot_field
1 Entry Name id Names & Taxonomy - uniprot_field
2 Gene Names gene_names Names & Taxonomy - uniprot_field
3 Gene Names (primary) gene_primary Names & Taxonomy - uniprot_field
4 Gene Names (synonym) gene_synonym Names & Taxonomy - uniprot_field

From the DataFrame, all return_field entries can be used to access UniProt data programmatically:

# To retrieve the default fields:
result, failed = mapper.get(["Q02880"])
>>> Fetched: 1 / 1

# Retrieve custom fields:
fields = ["accession", "organism_name", "structure_3d"]
result, failed = mapper.get(["Q02880"], fields=fields)
>>> Fetched: 1 / 1

Further, for the cross-referenced fields that have has_full_version set to yes, returning the same field with extra information is supported by passing <field_name>_full, such as xref_pdb_full.

All available return fields are also accessible through the attribute ProtMapper.supported_return_fields:

from UniProtMapper import ProtMapper
mapper = ProtMapper()
print(mapper.supported_return_fields)

>>> ['accession',
>>>  'id',
>>>  'gene_names',
>>>  ...
>>>  'xref_smart_full',
>>>  'xref_supfam_full']

Field-based Querying

UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the uniprotkb_fields module. This allows you to create sophisticated searches combining multiple criteria. For example:

from UniProtMapper import ProtKB
from UniProtMapper.uniprotkb_fields import (
    organism_name, 
    length, 
    reviewed, 
    date_modified
)

# Find reviewed human proteins with length between 100-200 amino acids
# that were modified after January 1st, 2024
query = (
    organism_name("human") & 
    reviewed(True) & 
    length(100, 200) & 
    date_modified("2024-01-01", "*")
)

protkb = ProtKB()
result = protkb.get(query)

For a list of all fields and their descriptions, check the API reference for the uniprotkb_fields module reference.

📖 Documentation

💻 Command Line Interface (CLI)

UniProtMapper provides a CLI for the ID Mapping class, ProtMapper, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown by protmap -h:

usage: UniProtMapper [-h] -i [IDS ...] [-r [RETURN_FIELDS ...]] [--default-fields] [-o OUTPUT]
                     [-from FROM_DB] [-to TO_DB] [-over] [-pf]

Retrieve data from UniProt using UniProt's RESTful API. For a list of all available fields, see: https://www.uniprot.org/help/return_fields 

Alternatively, use the --print-fields argument to print the available fields and exit the program.

optional arguments:
  -h, --help            show this help message and exit
  -i [IDS ...], --ids [IDS ...]
                        List of UniProt IDs to retrieve information from. Values must be
                        separated by spaces.
  -r [RETURN_FIELDS ...], --return-fields [RETURN_FIELDS ...]
                        If not defined, will pass `None`, returning all available fields.
                        Else, values should be fields to be returned separated by spaces. See
                        --print-fields for available options.
  --default-fields, -def
                        This option will override the --return-fields option. Returns only the
                        default fields stored in: <pkg_path>/resources/cli_return_fields.txt
  -o OUTPUT, --output OUTPUT
                        Path to the output file to write the returned fields. If not provided,
                        will write to stdout.
  -from FROM_DB, --from-db FROM_DB
                        The database from which the IDs are. For the available cross
                        references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
  -to TO_DB, --to-db TO_DB
                        The database to which the IDs will be mapped. For the available cross
                        references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
  -over, --overwrite    If desired to overwrite an existing file when using -o/--output
  -pf, --print-fields   Prints the available return fields and exits the program.

Usage example, retrieving default fields from <pkg_path>/resources/cli_return_fields.txt:

Image displaying the output of UniProtMapper's CLI, protmap

👏🏼 Credits


For issues, feature requests, or questions, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniprot_id_mapper-1.1.4.tar.gz (806.8 kB view details)

Uploaded Source

Built Distribution

uniprot_id_mapper-1.1.4-py3-none-any.whl (45.9 kB view details)

Uploaded Python 3

File details

Details for the file uniprot_id_mapper-1.1.4.tar.gz.

File metadata

  • Download URL: uniprot_id_mapper-1.1.4.tar.gz
  • Upload date:
  • Size: 806.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for uniprot_id_mapper-1.1.4.tar.gz
Algorithm Hash digest
SHA256 c398a03d5cab00254bd72aa43ec3df0f02799a56c7950b7c0cba58ad52cd79a3
MD5 ae785dfcb29cd84edaefa9240914c6eb
BLAKE2b-256 5146ae3a3b150a09b2ce14840017e8a2c7b47d8a5a720ccf3f2cbbd1c29f13b9

See more details on using hashes here.

File details

Details for the file uniprot_id_mapper-1.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for uniprot_id_mapper-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ba8f3e25411ba90dfe4dd4f67afda1686407583b099c57ef770b4784c22f18a3
MD5 e4acefc8f82162a4adc50b5b1b62dd81
BLAKE2b-256 f7ffc7a96f308d1585a7a7d4b925c7e5a7fce5a0dc5d4172c16b099259b029c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page