Skip to main content

A python command line tool for the quantification of peptidoform/proteoforms

Project description

Proteoformquant

Proteoformquant is a Python tool for quantitative analysis of proteoforms from mass spectrometry data.

Setup/Installation

Via PyPi repository (Recommended)

1. Install Proteoformquant package

The simplest way to use Proteformquant is to downloading it as a package from the PyPi repository using pip.

pip install proteoformquant

You should then be able to run Proteoformquant by running the following line in a terminal

2. Run Proteoformquant

proteoformquant

Access help by running

proteoformquant -h

More information on how to use proteoformquant is avaible in the 'usage' section of this document.

Via github (Alternative)

You also have the possibility to clone the repository from github and manually install the dependencies. You will need to create a Conda environment.

1. Install Conda/Mamba

If not already done, install Conda (https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html#regular-installation)

If you wish to use Mamba to create the Conda environment (faster) you can install mamba by running the following command in a terminal:

conda install mamba -n base -c conda-forge

2. Clone proteoformquant repository

git clone https://github.com/arthur-grimaud/Proteoformquant.git

3. Create and activate the environment

Next, create the environment using either Conda or Mamba by running the following command in the folder where 'environment.yml' is located

# With Conda
conda env create --file environment.yml
#With Manba
mamba env create --file environment.yml

you should now be able to activate the environment with:

mamba activate pfq-env

4. Run Proteoformquant

Run Proteoformquant by running the 'proteoformquant.py' script in 'src/proteoformquant'

python3 src/proteoformquant/proteoformquant.py

Usage

(n.b the command line listed here are given for the installation of Proteoformquant as a package. you will need to adapt the commands if you use the second installation method)

Proteoform requires 3 input files:

  • spectra file (.mgf or .mzml)
  • indentification file (.mzid) (Recommended: MSAmanda output)
  • a parameter file (.json)

A parameter file can be generated by running.

proteoformquant -cp

If you do not change the name or location of the parameter file you can run proteoformquant as follow

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf

by default this will create an output file 'output/' in the local directory. If you wish to change that use the -d parameter

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder

similarly you can change the outfile name with the -o parameter

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder -o output_file_1

5. Output Format

Quantification File

Below is the description of each column present in the quantification table ("quant_XXX.csv")

  • proforma: Peptidoform in Proforma Nomenclature
  • sequence: Peptidoform amino acid sequence.
  • brno: Modification notation indicating the type and location of post-translational modifications on the amino acid sequence.
  • protein: Accession numbers of the proteins the peptidoform is associated with, delimited by a semicolon if multiple.
  • intensity: Absolute intensity value of peptidoforms after quantification in chimeric spectra.
  • intensity_r1: Absolute intensity value of peptidoforms using only Rank 1 PSMs.
  • linked_psm: The total number of PSMs corresponding to a peptidoform.
  • linked_psm_validated: The number of PSMs validated after quantification in chimeric spectra.
  • rt_peak: The retention time value in seconds at the apex of the elution profile.
  • auc: The area under the curve, which can be used for quantification but is not recommended.
  • ambiguity: The number of spectra the peptidoform is identified in where site-determining ions were missing to confidently validate all peptidoforms.

Additional output files

PSM file ("psm_XXX.csv"):

  • spec: Index or identifier of each spectrum.
  • rank: Rank of the PSM, with a lower number indicating higher confidence.
  • sequence: Amino acid sequence of the peptide/protein.
  • brno: Modifications in brno nomenclature.
  • proforma: Peptidoform in Proforma Nomenclature.
  • score: Match score of the peptide spectrum match (from the identification file provided).
  • validated: Boolean value indicating whether the PSM has been validated.
  • frag_cov: Proportion of the theoretical fragments observed.

Log file ("log_XXX.csv"):

General information about the number of PSMs and peptidoforms validated/unvalidated at each step of the processing.

Obj file ("obj_XXX.pkl"):

Python's pickled ms_run class for visualization (WIP)

Contributing

To update

License

To update

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteoformquant-1.17.tar.gz (4.3 kB view hashes)

Uploaded Source

Built Distribution

proteoformquant-1.17-py3-none-any.whl (73.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page