A python package for working with string-db.org aliases (gene and protein ID mapping).
Project description
A python package for working with string-db.org aliases (gene and protein ID mapping).
This package is specifically for working offline using downloaded files. For accessing the STRINGdb API instead, see for example the stringdb package.
Usage
Mapping HGNC symbols
First, download the aliases and info files from string-db.org:
$ wget https://stringdb-static.org/download/protein.info.v11.5/9606.protein.info.v11.5.txt.gz $ wget https://stringdb-static.org/download/protein.aliases.v11.5/9606.protein.aliases.v11.5.txt.gz
Then, initialize our mapper object with the downloaded files, and map lists of IDs
from stringdb_alias import HGNCMapper mapper = HGNCMapper('9606.protein.info.v11.5.txt.gz', '9606.protein.aliases.v11.5.txt.gz') print(mapper.get_string_ids(['ADCK2', 'TOMM7', 'PRODH']))
The mapper always returns a pandas Series. This is convenient for directly mapping a column in a DataFrame. Moreover, if the input list is a pandas Series, the index is preserved in the output.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stringdb_alias-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de1a65451e8150c510e8cfb27c4e8808711d0755b46a4420625bd9f3ea09dc44 |
|
MD5 | f9eb00f0475ee744f6e78e55523c5e5f |
|
BLAKE2b-256 | 5b8c6759fedda0edaa6a3cabdc16bec09617a44818541f5ca5a4d813aee5a345 |