No project description provided
Project description
SAGEpy
A python interface to the core SAGE search engine for mass spectrometry proteomics
Installation
sagepy
is now available via pip:
pip install sagepy
Build from source
- Clone our fork of the SAGE repository:
git clone git@github.com:theGreatHerrLebert/sage.git
- Install the sage-core bindings using maturin, optionally in a virtual environment:
cd sage/crates/sagepy-connector
# Install maturin
pip install maturin
# Build and install the bindings
maturin build --release
# Install the bindings
pip install target/wheels/sagepy_connector-0.1.0-cp38-cp38-manylinux2014_x86_64.whl [--force-reinstall]
This will provide you with a python exposed version of the core SAGE library.
- Install the sagepy python package with poetry:
git clone git@github.com:theGreatHerrLebert/sagepy.git
cd sagepy
# Install poetry
pip install poetry
# Install sagepy
poetry install
Usage
sagepy
is a python interface to the core SAGE search engine. It exposes
the core functionality of SAGE in a pythonic way, allowing you to use it for a direct integration
into your python-based proteomics workflow. So far, it mainly mirrors structs that are available
in the core SAGE library.
Example generation of a sage database
import numpy as np
from sagepy.core import EnzymeBuilder, SageSearchConfiguration, validate_mods, validate_var_mods, SAGE_KNOWN_MODS
# configure a trypsin-like digestor of fasta files
enzyme_builder = EnzymeBuilder(
missed_cleavages=2,
min_len=5,
max_len=50,
cleave_at='KR',
restrict='P',
c_terminal=True,
)
# generate static cysteine modification
static_mods = { k: v for k, v in [SAGE_KNOWN_MODS.cysteine_static()] }
# generate variable methionine modification
variable_mods = { k: v for k, v in [SAGE_KNOWN_MODS.methionine_variable()] }
# generate SAGE compatible mod representations
static = validate_mods(static_mods)
variab = validate_var_mods(variable_mods)
with open('path/to/reference.fasta', 'r') as infile:
fasta = infile.read()
# set-up a config for a sage-database
sage_config = SageSearchConfiguration(
fasta=fasta,
static_mods=static,
variable_mods=variab,
enzyme_builder=enzyme_builder,
generate_decoys=True,
bucket_size=int(np.power(2, 14))
)
# generate the database for searching against
indexed_db = sage_config.generate_indexed_database()
Generate a query
import numpy as np
from sagepy.core import Precursor, RawSpectrum, ProcessedSpectrum, SpectrumProcessor, Tolerance, Scorer, Representation
### Example search of a sage database
precursor = Precursor(
charge=2,
mz=506.77,
)
intensity = np.array([ 202., 170., 205., 152., 1069., 595., 198., 805., 187.,
194., 197., 169., 196., 209., 638., 372., 235., 399.,
194., 185., 181., 170., 407., 150., 157., 175., 273.,
1135., 881., 337., 311., 243., 310., 153., 162., 210.,
277., 206., 189., 259., 658., 383., 166., 169., 219.,
186., 221., 193., 367., 283., 237., 157., 372., 1276.,
1618., 1102., 404., 232., 456., 765., 507., 223., 258.,
402., 187., 158., 153., 304., 218., 223., 156., 1605.,
1165., 1062., 434., 208., 155., 197., 221., 697., 397.,
180., 195., 512., 252., 367., 305., 335., 175., 174.,
296., 212.], dtype=np.float32)
mz = np.array([272.16873692, 356.16844797, 406.71079396, 406.71396814,
406.71714233, 406.72031653, 407.21246768, 407.21564382,
407.21881996, 407.22199612, 407.7144506 , 407.71762869,
488.27537883, 488.28581266, 499.29228981, 499.29580676,
499.29932372, 499.30284069, 506.75478369, 507.26157767,
541.26272227, 553.29188809, 577.30432041, 577.30810217,
595.32672633, 597.2907525 , 603.27568881, 614.32036769,
614.32426881, 614.32816995, 615.3272682 , 615.33117252,
616.33108578, 617.33572156, 636.30924838, 637.30619081,
637.31016425, 665.36284673, 666.36197292, 674.35335834,
674.35744565, 674.36153297, 675.35511968, 675.36330039,
679.3531909 , 680.35044702, 680.35455247, 687.36822726,
687.37648041, 688.37547678, 697.3616813 , 700.3617026 ,
715.36157366, 715.36578342, 715.36999319, 715.37420297,
715.37841277, 715.38262258, 716.36384605, 716.37227148,
716.38069696, 717.37103577, 725.35228543, 749.39291293,
749.39722166, 750.38424802, 786.44692356, 786.45575152,
787.4492132 , 787.45804678, 795.39284711, 812.41777208,
812.42225834, 812.42674462, 812.4312309 , 812.44020351,
813.40504794, 813.41851494, 813.42300396, 813.427493 ,
813.43198205, 813.44544927, 814.43784098, 828.42202737,
828.4265576 , 851.43464868, 899.45327427, 899.46271517,
912.45278821, 913.44673363, 915.45053417, 915.46482091], dtype=np.float32)
raw_spectrum = RawSpectrum(
file_id=1,
spec_id='DEMO-SPEC',
total_ion_current=12667.0,
precursors=[precursor],
mz=mz,
intensity=intensity
)
spec_processor = SpectrumProcessor(take_top_n=75)
query = spec_processor.process(raw_spectrum)
Search a database
from sagepy.core import Scorer
scorer = Scorer(report_psms=2, min_matched_peaks=5)
results = scorer.score(db=indexed_db, spectrum=query)
potential output:
[Feature(idx: PeptideIx(1009105), peptide_len: 9, spec_id: DEMO-SPEC, file_id: 1, rank: 1, label: 1, exp. mass: 1011.5254516601562, cal. mass: 1011.5347900390625, charge: 2, retention time: 0.0, aligned rt: 0.0, predicted rt: 0.0, delta rt model: 0.9990000128746033, delta mass: 2989.41943359375, isotope error: 3.010050058364868, average ppm: 5.889466285705566, hyperscore: 15.020833459653923, delta_next: 0.0, delta_best: 0.0, matched peaks: 5, longest b: 0,longest y: 4, longest y pct: 0.4444444477558136, missed cleavages: 0, matched intensity pct: 14.81151294708252, scored candidates: 9340, poisson: -2.177020383746938, discriminant score: 0.0, posterior error: 1.0, spectrum q: 1.0, peptide q: 1.0, protein q: 1.0, ms2 intensity: 4652.0, ms1 intensity: 0.0), Feature(idx: PeptideIx(1009105), peptide_len: 9, spec_id: DEMO-SPEC, file_id: 1, rank: 2, label: 1, exp. mass: 1011.5254516601562, cal. mass: 1011.5347900390625, charge: 2, retention time: 0.0, aligned rt: 0.0, predicted rt: 0.0, delta rt model: 0.9990000128746033, delta mass: 1001.641845703125, isotope error: 1.003350019454956, average ppm: 5.889466285705566, hyperscore: 15.020833459653923, delta_next: 0.0, delta_best: 0.0, matched peaks: 5, longest b: 0,longest y: 4, longest y pct: 0.4444444477558136, missed cleavages: 0, matched intensity pct: 14.81151294708252, scored candidates: 9340, poisson: -2.177020383746938, discriminant score: 0.0, posterior error: 1.0, spectrum q: 1.0, peptide q: 1.0, protein q: 1.0, ms2 intensity: 4652.0, ms1 intensity: 0.0)]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sagepy-0.2.18.tar.gz
(27.1 kB
view hashes)
Built Distribution
sagepy-0.2.18-py3-none-any.whl
(30.3 kB
view hashes)