Skip to main content

Annotate peptide spectrum matches (PSMs) from Sage with ambiguity information

Project description

SagePeptideAmbiguityAnnotator

A tool for annotating peptide ambiguity in Sage search engine results based on fragment ion coverage.

Description

The SagePeptideAmbiguityAnnotator processes peptide spectrum matches (PSMs) from Sage search engine output and annotates peptides with ambiguity information based on fragment ion coverage. It helps identify which parts of a peptide sequence have strong evidence from fragment ions and which parts are less certain. For open searches it can also place the observed mass shift as an internal modification, or labile modification if complete fragment ion coverage is observed.

Installation

From PyPI

pip install sage-peptide-ambiguity-annotator

From Source

git clone https://github.com/pgarrett-scripps/SagePeptideAmbiguityAnnotator.git
cd SagePeptideAmbiguityAnnotator
pip install -e .

Usage

Command Line Interface

sage-annotate --results results.sage.parquet \
              --fragments matched_fragments.sage.parquet \
              --output annotated_results.sage.parquet \
              --mass_error_type ppm \
              --mass_error_value 50.0 \
              --mass_shift

Streamlit Web Application

streamlit run streamlit_app.py

Then open your browser at http://localhost:8501

Python API

from sage_peptide_ambiguity_annotator.main import (
    read_input_files, 
    process_psm_data, 
    save_output
)

# Read input files
results_df, fragments_df = read_input_files(
    "results.sage.parquet", 
    "matched_fragments.sage.parquet"
)

# Process the data
output_df = process_psm_data(
    results_df, 
    fragments_df,
    mass_error_type="ppm",
    mass_error_value=50.0,
    use_mass_shift=True
)

# Save the output
save_output(output_df, "annotated_results.sage.parquet")

Input File Requirements

Sage Results File

The Sage results file must have the following columns:

  • psm_id: Unique identifier for each PSM
  • peptide: The peptide sequence with modifications
  • stripped_peptide: The peptide sequence without modifications
  • expmass: Experimental mass
  • calcmass: Calculated mass

Output

The output file contains all columns from the input results file plus:

  • ambiguity_sequence: Annotated peptide sequence with ambiguity information
  • mass_shift: The observed mass shift between the experimental and observed precursor masses. (Only applicable with open search)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dependencies

  • pandas
  • fastparquet
  • peptacular
  • streamlit (for web app)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sage_peptide_ambiguity_annotator-1.0.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file sage_peptide_ambiguity_annotator-1.0.0.tar.gz.

File metadata

File hashes

Hashes for sage_peptide_ambiguity_annotator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9a87c26a7741d9c2280ece9e4580957398755342cf543e6de0dc5abe616c08b9
MD5 5d2c58311e3c541f3a40287920c89d50
BLAKE2b-256 a2f115fd528ad6a98e06f9fed9dc85cd4e5df964c386aa7523bf704e2ddbf000

See more details on using hashes here.

File details

Details for the file sage_peptide_ambiguity_annotator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sage_peptide_ambiguity_annotator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10188d0131283994eda046ef71ea9b8f767081166992cb1d579a87ce80b33bde
MD5 a26519c78dad2931579334d4c8c30a2a
BLAKE2b-256 c2db9d15b7ca6ceaca2b7cb57e9a6d6698dfcb68da88d31d04c3a08659d270ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page