Utils derived from the O-GlcNAc Database source code
Project description
Utilsovs - 0.9
Utils derived from the O-GlcNAc Database code source.
Please report any bugs or incompatibilities.
If you use utilsovs in your academic work, please cite:
Malard F, Wulff-Fuentes E, Berendt R, Didier G and Olivier-Van Stichelen S. Automatization and self-maintenance of the O-GlcNAcome catalogue: A Smart Scientific Database. Database, Volume 2021, (2021).
Install
pip3 install utilsovs-pkg
Test install with pytest
from the package root directory.
Content
The package utilsovs contains:
- API wrappers - Proteins from UniProtKB ID (UniProtKB, GlyGen, The O-GlcNAc Database)
- API wrappers - Literature from PMID (MedLine/PubMed, Semantic Scholar, ProteomeXchange)
- Protein digestion tool: full and partial digestion and MW calculation (monoisotopic, average mass)
- Calculation of log2(odds) from alignment file and generation of sequence logo
- Match residuePosition on sequence fetched from UniProtKB to validate datasets
- Convert PDF to Text using wrappers and repair/clean
- Miscellaneous functions
API wrappers - Proteins from UniProtKB ID
from utilsovs import *
# Fetch UniProtKB Proteins REST API (@data.url)
data = fetch_one_UniProtKB('P08047',filepath='out.json',pprint=False)
# Fetch The O-GlcNAc Database Proteins REST API (@data.url)
data = fetch_one_oglcnacDB('P08047',filepath='out.json',pprint=False)
# Fetch RESTful Glygen webservice-based APIs (@data.url)
data = fetch_one_GlyGen('P08047',filepath='out.json',pprint=False)
# data is an class instance. To print the data of interest:
print (data.data)
API wrappers - Literature from PubMed IDentifier (PMID)
from utilsovs import *
# Fetch MedLine/PubMed API using Entrez.efetch (@data.url)
data = fetch_one_PubMed('33479245',db="pubmed",filepath='out.json',pprint=False)
# Fetch Semantic Scholar API (@data.url)
data = fetch_one_SemanticScholar('33479245',filepath='out.json',pprint=False)
# Fetch proteomeXchange using GET search request (@data.url)
data = fetch_one_proteomeXchange('29351928',filepath='out.json',pprint=False)
# data is an class instance. To print the data of interest:
print (data.data)
Compute - Digest protein, match residuePosition on sequence or calculate log2(odds) from alignment file and draw consensus sequence logo
from utilsovs import *
# Full digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
data = compute_one_fullDigest('P13693','Trypsin',filepath='out.json')
# Partial digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
# All possible combinations of adjacent fragments are generated
data = compute_one_partialDigest('P13693','Trypsin',filepath='out.json')
# Match residuePosition with UniProtKB ID protein sequence
data = compute_match_aaSeq('P13693','D6',filepath='out.json')
# Compute log2odds from alignment file - Input for draw_one_seqLogo()
data = compute_aln_log2odds('align.aln',organism='HUMAN',filepath='out.json')
# Draw sequence logo from compute_aln_log2odds output file
# See https://logomaker.readthedocs.io/en/latest/implementation.html
# Edit logomaker config in src/ultilsovs_draw.py
draw_one_seqLogo('compute_aln_log2odds.json',filepath='out.png',showplot=False,center_values=False)
# data is an class instance. To print the data of interest:
print (data.data)
Text Processing
from utilsovs import *
# PDF to Text conversion using GNU pdftotext (Linux/Mac) or Tika (Windows) and text repair + cleaning.
data = pdf_one_pdf2text('test.pdf',filepath='out.dat',clean=True)
# data is an class instance. To print the data of interest:
print (data.data)
Miscellaneous standalone functions
Functions below return Python objects or variables.
from utilsovs import *
# Show list of proteases for digest utils
show_proteases()
# Return protein sequence from UniProtKB ID
get_one_sequence('P13693',filepath='out.dat')
# Compute MW of a peptide and return [string,mw_monoisotopic,mw_average]
compute_one_MW('EWENMR',filepath='out.json')
#Compute amino-acids frequency table for a given organism from uniprot_sprot.fasta.gz
get_one_freqAAdict(organism='HUMAN',filepath='out.json')
#Clear all data in utilsovs cache
clearCache()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file utilsovs-pkg-0.9.5.tar.gz
.
File metadata
- Download URL: utilsovs-pkg-0.9.5.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfa36a7a90495eaf1d4eb07c40743b161d28b59f9ecb691129b7ef8a5fbe4128 |
|
MD5 | 98c397b7c5384edc1dbb3901d52a0bca |
|
BLAKE2b-256 | d25098bb8cd1789e55238dc186e68d42799eb5c7612046b679af679b8a8ed54a |
File details
Details for the file utilsovs_pkg-0.9.5-py3-none-any.whl
.
File metadata
- Download URL: utilsovs_pkg-0.9.5-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc0531c216283ec60616c77e4804b7039d94a045d8bf55e9edaba4a6c8936a44 |
|
MD5 | 2fd808316cc7db5509ff48e177bf1e67 |
|
BLAKE2b-256 | bfa92dc69ac55770d232cb9e60cf9ae4f6381c10d6ba87f04dd734d9193d9a10 |