Skip to main content

A python package for working with string-db.org aliases (gene and protein ID mapping).

Project description

A python package for working with string-db.org aliases (gene and protein ID mapping).

This package is specifically for working offline using downloaded files. For accessing the STRINGdb API instead, see for example the stringdb package.

Usage

Mapping HGNC symbols

First, download the aliases and info files from string-db.org:

$ wget https://stringdb-static.org/download/protein.info.v11.5/9606.protein.info.v11.5.txt.gz
$ wget https://stringdb-static.org/download/protein.aliases.v11.5/9606.protein.aliases.v11.5.txt.gz

Then, initialize our mapper object with the downloaded files, and map lists of IDs

from stringdb_alias import HGNCMapper

mapper = HGNCMapper('9606.protein.info.v11.5.txt.gz', '9606.protein.aliases.v11.5.txt.gz')

print(mapper.get_string_ids(['ADCK2', 'TOMM7', 'PRODH']))

The mapper always returns a pandas Series. This is convenient for directly mapping a column in a DataFrame. Moreover, if the input list is a pandas Series, the index is preserved in the output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stringdb_alias-1.0.tar.gz (3.4 kB view hashes)

Uploaded Source

Built Distribution

stringdb_alias-1.0-py3-none-any.whl (4.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page