Skip to main content

A library for linking entities of biological knowledge bases.

Project description

biodblinker

A library for linking entities of biological knowledge bases.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Installing

pip install biodblinker

Installing from source

python setup.py

Usage

from biodblinker import UniprotLinker

uniprot_linker = UniprotLinker()

# Get list of all included uniprot accessions
uniprot_accessions = uniprot_linker.uniprot_ids

select_accessions = ['P31946', 'P62258', 'Q04917']

# Get the list of names for each accession in select_accessions
select_names = uniprot_linker.convert_uniprot_to_names(select_accessions)

# Get the list of kegg gene ids for each accession in select_accessions
select_genes = uniprot_linker.convert_uniprot_to_kegg(select_accessions)

Use Case - Linking uniprot proteins and mesh diseases via KEGG

import requests
from biodblinker import KEGGLinker

linker = KEGGLinker()
unique_pairs = set()

url = 'http://rest.kegg.jp/link/hsa/disease'
resp = requests.get(url)

if resp.ok:
    for line in resp.iter_lines(decode_unicode=True):
        kegg_disease, kegg_gene = line.strip().split('\t')
        # strip the prefix from the disease
        kegg_disease = kegg_disease.split(':')[1]

        # prefix is retained for genes as the ids are numeric
        uniprot_protein = linker.convert_geneid_to_uniprot([kegg_gene])
        mesh_disease = linker.convert_disease_to_mesh([kegg_disease])
        if len(uniprot_protein[0]) == 0:
            continue
        if len(mesh_disease[0]) == 0:
            continue
        for protein in uniprot_protein[0]:
            for disease in mesh_disease[0]:
                unique_pairs.add((protein, disease))

for protein, disease in unique_pairs:
    print(f'{protein}\tRELATED_DISEASE\t{disease}')

Downloading mappings

When a biodblinker is initialized it verifies that all necessary mapping files are present and if not downloads the precompiled mappings

Building the mapping files

It is also possible to generate the mappings from their sources

  • Note this process will take several hours and requires a large ammount of disk space due to the size of the source files. The source files are removed once the mappings are generated
import biodblinker

gen = biodblinker.MappingGenerator()
gen.generate_mappings(<drugbank_username>, <drugbank_password>)

Mapping sources and licenses

BioDBLinker uses multiple sources to generate the mappings. BioDBLinker must be used in compliance with these licenses and citation policies where applicable

Source Database License Type URL
UniProt CC BY 4.0 https://www.uniprot.org/help/license
Drugbank CC BY NC 4.0 https://www.drugbank.ca/legal/terms_of_use
KEGG Custom https://www.kegg.jp/kegg/legal.html
Sider CC BY-NC-SA http://sideeffects.embl.de/about/
Stitch CC BY 4.0 http://stitch.embl.de/cgi/download.pl
HPA CC BY SA 3.0 https://www.proteinatlas.org/about/licence
Cellosaurus CC BY 4.0 https://web.expasy.org/cgi-bin/cellosaurus/faq#Q22

Funding

The development of this module has been fully supported by the CLARIFY project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 875160.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biodblinker-0.0.4.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

biodblinker-0.0.4-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file biodblinker-0.0.4.tar.gz.

File metadata

  • Download URL: biodblinker-0.0.4.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for biodblinker-0.0.4.tar.gz
Algorithm Hash digest
SHA256 54e8a2af598a9bac0fb36edf69e88a0d1d75e6781c7001c2a9034a319a31d987
MD5 35d735a82099d03220eae3c130258e38
BLAKE2b-256 9da8d1bf24183a6e63bdb0d8ec9b50234045bac97eae77ddd626c473d58c3fb0

See more details on using hashes here.

File details

Details for the file biodblinker-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: biodblinker-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for biodblinker-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7656e9fecaaa2dc8223b90d6ca5fe40306faf31d67be80040d89ae1d284c5953
MD5 e846587ba9813db3fa143f1256377864
BLAKE2b-256 e90475505d03bece78f73471016e995d5825e43a3b0c8a452c79bccdbc10496c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page