A library for linking entities of biological knowledge bases.
Project description
biodblinker
A library for linking entities of biological knowledge bases.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Installing
pip install biodblinker
Installing from source
python setup.py
Usage
from biodblinker import UniprotLinker
uniprot_linker = UniprotLinker()
# Get list of all included uniprot accessions
uniprot_accessions = uniprot_linker.uniprot_ids
select_accessions = ['P31946', 'P62258', 'Q04917']
# Get the list of names for each accession in select_accessions
select_names = uniprot_linker.convert_uniprot_to_names(select_accessions)
# Get the list of kegg gene ids for each accession in select_accessions
select_genes = uniprot_linker.convert_uniprot_to_kegg(select_accessions)
Use Case - Linking uniprot proteins and mesh diseases via KEGG
import requests
from biodblinker import KEGGLinker
linker = KEGGLinker()
unique_pairs = set()
url = 'http://rest.kegg.jp/link/hsa/disease'
resp = requests.get(url)
if resp.ok:
for line in resp.iter_lines(decode_unicode=True):
kegg_disease, kegg_gene = line.strip().split('\t')
# strip the prefix from the disease
kegg_disease = kegg_disease.split(':')[1]
# prefix is retained for genes as the ids are numeric
uniprot_protein = linker.convert_geneid_to_uniprot([kegg_gene])
mesh_disease = linker.convert_disease_to_mesh([kegg_disease])
if len(uniprot_protein[0]) == 0:
continue
if len(mesh_disease[0]) == 0:
continue
for protein in uniprot_protein[0]:
for disease in mesh_disease[0]:
unique_pairs.add((protein, disease))
for protein, disease in unique_pairs:
print(f'{protein}\tRELATED_DISEASE\t{disease}')
Downloading mappings
When a biodblinker is initialized it verifies that all necessary mapping files are present and if not downloads the precompiled mappings
Building the mapping files
It is also possible to generate the mappings from their sources
- Note this process will take several hours and requires a large ammount of disk space due to the size of the source files. The source files are removed once the mappings are generated
import biodblinker
gen = biodblinker.MappingGenerator()
gen.generate_mappings(<drugbank_username>, <drugbank_password>)
Mapping sources and licenses
BioDBLinker uses multiple sources to generate the mappings. BioDBLinker must be used in compliance with these licenses and citation policies where applicable
Source Database | License Type | URL |
---|---|---|
UniProt | CC BY 4.0 | https://www.uniprot.org/help/license |
Drugbank | CC BY NC 4.0 | https://www.drugbank.ca/legal/terms_of_use |
KEGG | Custom | https://www.kegg.jp/kegg/legal.html |
Sider | CC BY-NC-SA | http://sideeffects.embl.de/about/ |
Stitch | CC BY 4.0 | http://stitch.embl.de/cgi/download.pl |
HPA | CC BY SA 3.0 | https://www.proteinatlas.org/about/licence |
Cellosaurus | CC BY 4.0 | https://web.expasy.org/cgi-bin/cellosaurus/faq#Q22 |
Funding
The development of this module has been fully supported by the CLARIFY project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 875160.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file biodblinker-0.0.4.tar.gz
.
File metadata
- Download URL: biodblinker-0.0.4.tar.gz
- Upload date:
- Size: 38.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54e8a2af598a9bac0fb36edf69e88a0d1d75e6781c7001c2a9034a319a31d987 |
|
MD5 | 35d735a82099d03220eae3c130258e38 |
|
BLAKE2b-256 | 9da8d1bf24183a6e63bdb0d8ec9b50234045bac97eae77ddd626c473d58c3fb0 |
File details
Details for the file biodblinker-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: biodblinker-0.0.4-py3-none-any.whl
- Upload date:
- Size: 31.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7656e9fecaaa2dc8223b90d6ca5fe40306faf31d67be80040d89ae1d284c5953 |
|
MD5 | e846587ba9813db3fa143f1256377864 |
|
BLAKE2b-256 | e90475505d03bece78f73471016e995d5825e43a3b0c8a452c79bccdbc10496c |