Skip to main content

Utils derived from the O-GlcNAc Database source code

Project description

Utilsovs - 0.9

Utils derived from the O-GlcNAc Database code source.

Please report any bugs or incompatibilities.

If you use utilsovs in your academic work, please cite:

Malard F, Wulff-Fuentes E, Berendt R, Didier G and Olivier-Van Stichelen S. Automatization and self-maintenance of the O-GlcNAcome catalogue: A Smart Scientific Database. Database, Volume 2021, (2021).

Install

pip3 install utilsovs-pkg

Test install with pytest from the package root directory.

Content

The package utilsovs contains:

  • API wrappers - Proteins from UniProtKB ID (UniProtKB, GlyGen, The O-GlcNAc Database)
  • API wrappers - Literature from PMID (MedLine/PubMed, Semantic Scholar, ProteomeXchange)
  • Protein digestion tool: full and partial digestion and MW calculation (monoisotopic, average mass)
  • Calculation of log2(odds) from alignment file and generation of sequence logo
  • Match residuePosition on sequence fetched from UniProtKB to validate datasets
  • Convert PDF to Text using wrappers and repair/clean
  • Miscellaneous functions

API wrappers - Proteins from UniProtKB ID

from utilsovs import *

# Fetch UniProtKB Proteins REST API (@data.url)
data = fetch_one_UniProtKB('P08047',filepath='out.json',pprint=False)

# Fetch The O-GlcNAc Database Proteins REST API (@data.url)
data = fetch_one_oglcnacDB('P08047',filepath='out.json',pprint=False)

# Fetch RESTful Glygen webservice-based APIs (@data.url)
data = fetch_one_GlyGen('P08047',filepath='out.json',pprint=False)

# data is an class instance. To print the data of interest:
print (data.data)

API wrappers - Literature from PubMed IDentifier (PMID)

from utilsovs import *

# Fetch MedLine/PubMed API using Entrez.efetch (@data.url)
data = fetch_one_PubMed('33479245',db="pubmed",filepath='out.json',pprint=False)

# Fetch Semantic Scholar API (@data.url)
data = fetch_one_SemanticScholar('33479245',filepath='out.json',pprint=False)

# Fetch proteomeXchange using GET search request (@data.url)
data = fetch_one_proteomeXchange('29351928',filepath='out.json',pprint=False)

# data is an class instance. To print the data of interest:
print (data.data)

Compute - Digest protein, match residuePosition on sequence or calculate log2(odds) from alignment file and draw consensus sequence logo

from utilsovs import *

# Full digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
data = compute_one_fullDigest('P13693','Trypsin',filepath='out.json')

# Partial digestion of a UniProtKB ID protein sequence: [ ['PEPTIDE',(start,end),mw_monoisotopic,mw_average], ... ]
# All possible combinations of adjacent fragments are generated
data = compute_one_partialDigest('P13693','Trypsin',filepath='out.json')

# Match residuePosition with UniProtKB ID protein sequence
data = compute_match_aaSeq('P13693','D6',filepath='out.json')

# Compute log2odds from alignment file - Input for draw_one_seqLogo()
data = compute_aln_log2odds('align.aln',organism='HUMAN',filepath='out.json')

# Draw sequence logo from compute_aln_log2odds output file
# See https://logomaker.readthedocs.io/en/latest/implementation.html
# Edit logomaker config in src/ultilsovs_draw.py
draw_one_seqLogo('compute_aln_log2odds.json',filepath='out.png',showplot=False,center_values=False)

# data is an class instance. To print the data of interest:
print (data.data)

Text Processing

from utilsovs import *

# PDF to Text conversion using GNU pdftotext (Linux/Mac) or Tika (Windows) and text repair + cleaning.
data = pdf_one_pdf2text('test.pdf',filepath='out.dat',clean=True)

# data is an class instance. To print the data of interest:
print (data.data)

Miscellaneous standalone functions

Functions below return Python objects or variables.

from utilsovs import *

# Show list of proteases for digest utils
show_proteases()

# Return protein sequence from UniProtKB ID
get_one_sequence('P13693',filepath='out.dat')

# Compute MW of a peptide and return [string,mw_monoisotopic,mw_average]
compute_one_MW('EWENMR',filepath='out.json')

#Compute amino-acids frequency table for a given organism from uniprot_sprot.fasta.gz
get_one_freqAAdict(organism='HUMAN',filepath='out.json')

#Clear all data in utilsovs cache
clearCache()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utilsovs-pkg-0.9.5.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

utilsovs_pkg-0.9.5-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file utilsovs-pkg-0.9.5.tar.gz.

File metadata

  • Download URL: utilsovs-pkg-0.9.5.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for utilsovs-pkg-0.9.5.tar.gz
Algorithm Hash digest
SHA256 dfa36a7a90495eaf1d4eb07c40743b161d28b59f9ecb691129b7ef8a5fbe4128
MD5 98c397b7c5384edc1dbb3901d52a0bca
BLAKE2b-256 d25098bb8cd1789e55238dc186e68d42799eb5c7612046b679af679b8a8ed54a

See more details on using hashes here.

File details

Details for the file utilsovs_pkg-0.9.5-py3-none-any.whl.

File metadata

  • Download URL: utilsovs_pkg-0.9.5-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for utilsovs_pkg-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 dc0531c216283ec60616c77e4804b7039d94a045d8bf55e9edaba4a6c8936a44
MD5 2fd808316cc7db5509ff48e177bf1e67
BLAKE2b-256 bfa92dc69ac55770d232cb9e60cf9ae4f6381c10d6ba87f04dd734d9193d9a10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page