Skip to main content

Python Package for running custom protein inference algorithms on tab-formatted tandem MS/MS search results.

Project description

Py Protein Inference

PyProteinInference is a Python package for running various protein inference algorithms on tandem mass spectrometry search results and generating protein to peptide mappings with protein level false discovery rates..

Key Features

  • Protein Inference and Scoring:

    • Maps peptides to proteins.
    • Generates protein scores from provided PSMs.
    • Calculates set-based protein-level false discovery rates for MS data filtering.
  • Supported Input Formats:

    • Search Result File Types: idXML, mzIdentML, or pepXML.
    • PSM files from Percolator.
    • Custom tab-delimited files.
  • Output:

    • User-friendly CSV file containing Proteins, Peptides, q-values, and Protein Scores.
  • Supported Inference Procedures:

    • Parsimony - Returns the Minimal set of proteins based on the input peptides.
    • Exclusion - Removes all non-distinguishing peptides on the protein level.
    • Inclusion - Returns all possible proteins.
    • Peptide Centric - Returns protein groups based on peptide assignments.

Requirements

  1. Python 3.9 or greater.
  2. Python Packages: numpy, pyteomics, pulp, PyYAML, matplotlib, pyopenms, lxml, tqdm, pywebview, nicegui. These should be installed automatically during installation.

Quick Start Guide

  1. Install the package using pip:
pip install pyproteininference
  1. Run the standard commandline from an idXML file
protein_inference_cli.py \
-f /path/to/target/file.idXML \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
  1. Run the standard commandline from an mzIdentML file
protein_inference_cli.py \
-f /path/to/target/file.mzid \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
  1. Run the standard commandline from a pepXML file
protein_inference_cli.py \
-f /path/to/target/file.pep.xml \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
  1. Run the standard commandline tool with tab delimited results directly from percolator to run a particular inference method. By default, peptide centric inference is selected if a parameter file is not specified:
protein_inference_cli.py \
-t /path/to/target/file.txt \
-d /path/to/decoy/file.txt \
-db /path/to/database/file.fasta 
  1. Specifying Parameters. The two most common parameters to change are the inference type, and the decoy symbol (for identifying decoy proteins vs target proteins). The parameters can be quickly altered by creating a file called params.yaml as follows:
parameters:
  inference:
    inference_type: parsimony
  identifiers:
    decoy_symbol: "decoy_"

The inference type can be one of: parsimony, peptide_centric, inclusion, exclusion, or first_protein. All parameters are optional, so you only need to define the ones you want to alter. Parameters that are not defined are set to default values. See the package documentation for the default parameters.

  1. Run the standard commandline tool again, this time specifying the parameters as above:
protein_inference_cli.py \
-t /path/to/target/file.txt \
-d /path/to/decoy/file.txt \
-db /path/to/database/file.fasta \
-y /path/to/params.yaml
  1. Running with docker
    • Either Pull the image from docker hub:
      • docker pull hinklet/pyproteininference:latest
    • Or Build the image with the following command (After having cloned the repository):
      • git clone REPOSITORY_URL
      • cd pyproteininference
      • docker build -t pyproteininference:latest .
    • Run the tool, making sure to volume mount in the directory with your input data and parameters. In the case below, that local directory would be /path/to/local/directory and the path in the container is /data
        	docker run -v /path/to/local/directory/:/data \
        	-it hinklet/pyproteininference:latest \
        	python /usr/local/bin/protein_inference_cli.py \
        	-f /data/input_file.txt \
        	-db /data/database.fasta \
        	-y /data/parameters.yaml \
        	-o /data/
      

Documentation

For more information please see the full package documentation (https://thinkle12.github.io/pyproteininference/).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyproteininference-1.1.0.tar.gz (657.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyproteininference-1.1.0-py3-none-any.whl (77.8 kB view details)

Uploaded Python 3

File details

Details for the file pyproteininference-1.1.0.tar.gz.

File metadata

  • Download URL: pyproteininference-1.1.0.tar.gz
  • Upload date:
  • Size: 657.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for pyproteininference-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e9c8a47b01b267a3a75dfb81e734677585e32fb58751879cc3abe5ea21c99b18
MD5 049b4c1dc3d1306ecda962286ccaca57
BLAKE2b-256 55f70e21e072c1392765e500de438963cbb098d4519a401ab1ac24ac082bd13a

See more details on using hashes here.

File details

Details for the file pyproteininference-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyproteininference-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb1add8242dee3463c6d5307816352732b75018b848595dbb6a716dd677e19d3
MD5 b8d64e91012740330e60167996f0039f
BLAKE2b-256 f6b72cbdd12575f659b32a730d6874cc9fd2553b26e8c752a77f2424a1217c53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page