Skip to main content

No project description provided

Project description

SAGEpy

A python interface to the core SAGE search engine for mass spectrometry proteomics

logo

Installation

sagepy is now available via pip:

pip install sagepy

Build from source

  1. Clone our fork of the SAGE repository:
git clone git@github.com:theGreatHerrLebert/sage.git
  1. Install the sage-core bindings using maturin, optionally in a virtual environment:
cd sage/crates/sagepy-connector

# Install maturin
pip install maturin

# Build and install the bindings
maturin build --release

# Install the bindings
pip install target/wheels/sagepy_connector-0.1.0-cp38-cp38-manylinux2014_x86_64.whl [--force-reinstall]

This will provide you with a python exposed version of the core SAGE library.

  1. Install the sagepy python package with poetry:
git clone git@github.com:theGreatHerrLebert/sagepy.git

cd sagepy

# Install poetry
pip install poetry

# Install sagepy
poetry install

Usage

sagepy is a python interface to the core SAGE search engine. It exposes the core functionality of SAGE in a pythonic way, allowing you to use it for a direct integration into your python-based proteomics workflow. So far, it mainly mirrors structs that are available in the core SAGE library.

Example generation of a sage database

import numpy as np
from sagepy.core import EnzymeBuilder, SageSearchConfiguration

# configure a trypsin-like digestor of fasta files
enzyme_builder = EnzymeBuilder(
    missed_cleavages=2, 
    min_len=5, 
    max_len=50, 
    cleave_at='KR', 
    restrict='P', 
    c_terminal=True,
)

# UPDATE: Modification handling is simplified, using canonical UNIMOD notation
static_mods = {"C": "[UNIMOD:4]"}  # static cysteine modification
variable_mods = {"M": ["[UNIMOD:35]"]}

with open('path/to/reference.fasta', 'r') as infile:
    fasta = infile.read()

# set-up a config for a sage-database
sage_config = SageSearchConfiguration(
    fasta=fasta,
    static_mods=static_mods,
    variable_mods=variable_mods,
    enzyme_builder=enzyme_builder,
    generate_decoys=True,
    bucket_size=int(np.power(2, 14))
)

# generate the database for searching against
indexed_db = sage_config.generate_indexed_database()

Generate a query

import numpy as np
from sagepy.core import Precursor, RawSpectrum, ProcessedSpectrum, SpectrumProcessor, Tolerance, Scorer, Representation

### Example search of a sage database
precursor = Precursor(
    charge=2,
    mz=506.77,
)

intensity = np.array([ 202.,  170.,  205.,  152., 1069.,  595.,  198.,  805.,  187.,
        194.,  197.,  169.,  196.,  209.,  638.,  372.,  235.,  399.,
        194.,  185.,  181.,  170.,  407.,  150.,  157.,  175.,  273.,
       1135.,  881.,  337.,  311.,  243.,  310.,  153.,  162.,  210.,
        277.,  206.,  189.,  259.,  658.,  383.,  166.,  169.,  219.,
        186.,  221.,  193.,  367.,  283.,  237.,  157.,  372., 1276.,
       1618., 1102.,  404.,  232.,  456.,  765.,  507.,  223.,  258.,
        402.,  187.,  158.,  153.,  304.,  218.,  223.,  156., 1605.,
       1165., 1062.,  434.,  208.,  155.,  197.,  221.,  697.,  397.,
        180.,  195.,  512.,  252.,  367.,  305.,  335.,  175.,  174.,
        296.,  212.], dtype=np.float32)

mz = np.array([272.16873692, 356.16844797, 406.71079396, 406.71396814,
       406.71714233, 406.72031653, 407.21246768, 407.21564382,
       407.21881996, 407.22199612, 407.7144506 , 407.71762869,
       488.27537883, 488.28581266, 499.29228981, 499.29580676,
       499.29932372, 499.30284069, 506.75478369, 507.26157767,
       541.26272227, 553.29188809, 577.30432041, 577.30810217,
       595.32672633, 597.2907525 , 603.27568881, 614.32036769,
       614.32426881, 614.32816995, 615.3272682 , 615.33117252,
       616.33108578, 617.33572156, 636.30924838, 637.30619081,
       637.31016425, 665.36284673, 666.36197292, 674.35335834,
       674.35744565, 674.36153297, 675.35511968, 675.36330039,
       679.3531909 , 680.35044702, 680.35455247, 687.36822726,
       687.37648041, 688.37547678, 697.3616813 , 700.3617026 ,
       715.36157366, 715.36578342, 715.36999319, 715.37420297,
       715.37841277, 715.38262258, 716.36384605, 716.37227148,
       716.38069696, 717.37103577, 725.35228543, 749.39291293,
       749.39722166, 750.38424802, 786.44692356, 786.45575152,
       787.4492132 , 787.45804678, 795.39284711, 812.41777208,
       812.42225834, 812.42674462, 812.4312309 , 812.44020351,
       813.40504794, 813.41851494, 813.42300396, 813.427493  ,
       813.43198205, 813.44544927, 814.43784098, 828.42202737,
       828.4265576 , 851.43464868, 899.45327427, 899.46271517,
       912.45278821, 913.44673363, 915.45053417, 915.46482091], dtype=np.float32)

raw_spectrum = RawSpectrum(
    file_id=1,
    spec_id='DEMO-SPEC',
    total_ion_current=12667.0,
    precursors=[precursor],
    mz=mz,
    intensity=intensity
)

spec_processor = SpectrumProcessor(take_top_n=75)
query = spec_processor.process(raw_spectrum)

Search a database

from sagepy.core import Scorer

# UPDATE: pass modifications to the scorer, necessary for PTM handling
scorer = Scorer(report_psms=2, min_matched_peaks=5, variable_mods=variable_mods, static_mods=static_mods)
results = scorer.score(db=indexed_db, spectrum=query)

potential output:

[Feature(idx: PeptideIx(1009105), peptide_len: 9, spec_id: DEMO-SPEC, file_id: 1, rank: 1, label: 1, exp. mass: 1011.5254516601562, cal. mass: 1011.5347900390625, charge: 2, retention time: 0.0, aligned rt: 0.0, predicted rt: 0.0, delta rt model: 0.9990000128746033, delta mass: 2989.41943359375, isotope error: 3.010050058364868, average ppm: 5.889466285705566, hyperscore: 15.020833459653923, delta_next: 0.0, delta_best: 0.0, matched peaks: 5, longest b: 0,longest y: 4, longest y pct: 0.4444444477558136, missed cleavages: 0, matched intensity pct: 14.81151294708252, scored candidates: 9340, poisson: -2.177020383746938, discriminant score: 0.0, posterior error: 1.0, spectrum q: 1.0, peptide q: 1.0, protein q: 1.0, ms2 intensity: 4652.0, ms1 intensity: 0.0), Feature(idx: PeptideIx(1009105), peptide_len: 9, spec_id: DEMO-SPEC, file_id: 1, rank: 2, label: 1, exp. mass: 1011.5254516601562, cal. mass: 1011.5347900390625, charge: 2, retention time: 0.0, aligned rt: 0.0, predicted rt: 0.0, delta rt model: 0.9990000128746033, delta mass: 1001.641845703125, isotope error: 1.003350019454956, average ppm: 5.889466285705566, hyperscore: 15.020833459653923, delta_next: 0.0, delta_best: 0.0, matched peaks: 5, longest b: 0,longest y: 4, longest y pct: 0.4444444477558136, missed cleavages: 0, matched intensity pct: 14.81151294708252, scored candidates: 9340, poisson: -2.177020383746938, discriminant score: 0.0, posterior error: 1.0, spectrum q: 1.0, peptide q: 1.0, protein q: 1.0, ms2 intensity: 4652.0, ms1 intensity: 0.0)]

Hot TODOs

  • Documentation and in-detail examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagepy-0.3.1.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagepy-0.3.1-py3-none-any.whl (44.9 kB view details)

Uploaded Python 3

File details

Details for the file sagepy-0.3.1.tar.gz.

File metadata

  • Download URL: sagepy-0.3.1.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for sagepy-0.3.1.tar.gz
Algorithm Hash digest
SHA256 e35b76e5a9e1f461299fd4e91a73c6b3a3d88ba0cf1c30c5a2a63fd3293a4685
MD5 352c50f876969e51a674769fda94a2d2
BLAKE2b-256 c117333f4cb5c2a79b975f285eca5d4b497946b77bb07d3043526668befc9ced

See more details on using hashes here.

File details

Details for the file sagepy-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sagepy-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 44.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for sagepy-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e76f9397f4eee6027c7b0136a35aced2ec7927b9f2f22ce715d4b9dcac651943
MD5 5f10054cf455eb00afbf525b6fc55b4b
BLAKE2b-256 003d44fecd0712f21c7719fec5f5759dd574dd5711042dec805ed76b40661b37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page