Skip to main content

A python command line tool for the quantification of peptidoform/proteoforms

Project description

Proteoformquant

Proteoformquant is a Python tool for quantitative analysis of proteoforms from mass spectrometry data.

Setup/Installation

Via PyPi repository (Recommended)

1. Install Proteoformquant package

The simplest way to use Proteformquant is to downloading it as a package from the PyPi repository using pip.

pip install proteoformquant

You should then be able to run Proteoformquant by running the following line in a terminal

2. Run Proteoformquant

proteoformquant

Access help by running

proteoformquant -h

More information on how to use proteoformquant is avaible in the 'usage' section of this document.

Via github (Alternative)

You also have the possibility to clone the repository from github and manually install the dependencies. You will need to create a Conda environment.

1. Install Conda/Mamba

If not already done, install Conda (https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html#regular-installation)

If you wish to use Mamba to create the Conda environment (faster) you can install mamba by running the following command in a terminal:

conda install mamba -n base -c conda-forge

2. Clone proteoformquant repository

git clone https://github.com/arthur-grimaud/Proteoformquant.git

3. Create and activate the environment

Next, create the environment using either Conda or Mamba by running the following command in the folder where 'environment.yml' is located

# With Conda
conda env create --file environment.yml
#With Manba
mamba env create --file environment.yml

you should now be able to activate the environment with:

mamba activate pfq-env

4. Run Proteoformquant

Run Proteoformquant by running the 'proteoformquant.py' script in 'src/proteoformquant'

python3 src/proteoformquant/proteoformquant.py

Usage

(n.b the command line listed here are given for the installation of Proteoformquant as a package. you will need to adapt the commands if you use the second installation method)

Proteoform requires 3 input files:

  • spectra file (.mgf or .mzml)
  • indentification file (.mzid) (Recommended: MSAmanda output)
  • a parameter file (.json)

A parameter file can be generated by running.

proteoformquant -cp

If you do not change the name or location of the parameter file you can run proteoformquant as follow

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf

by default this will create an output file 'output/' in the local directory. If you wish to change that use the -d parameter

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder

similarly you can change the outfile name with the -o parameter

proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder -o output_file_1

5. Output Format

Quantification File

Below is the description of each column present in the quantification table ("quant_XXX.csv")

  • proforma: Peptidoform in Proforma Nomenclature
  • sequence: Peptidoform amino acid sequence.
  • brno: Modification notation indicating the type and location of post-translational modifications on the amino acid sequence.
  • protein: Accession numbers of the proteins the peptidoform is associated with, delimited by a semicolon if multiple.
  • intensity: Absolute intensity value of peptidoforms after quantification in chimeric spectra.
  • intensity_r1: Absolute intensity value of peptidoforms using only Rank 1 PSMs.
  • linked_psm: The total number of PSMs corresponding to a peptidoform.
  • linked_psm_validated: The number of PSMs validated after quantification in chimeric spectra.
  • rt_peak: The retention time value in seconds at the apex of the elution profile.
  • auc: The area under the curve, which can be used for quantification but is not recommended.
  • ambiguity: The number of spectra the peptidoform is identified in where site-determining ions were missing to confidently validate all peptidoforms.

Additional output files

PSM file ("psm_XXX.csv"):

  • spec: Index or identifier of each spectrum.
  • rank: Rank of the PSM, with a lower number indicating higher confidence.
  • sequence: Amino acid sequence of the peptide/protein.
  • brno: Modifications in brno nomenclature.
  • proforma: Peptidoform in Proforma Nomenclature.
  • score: Match score of the peptide spectrum match (from the identification file provided).
  • validated: Boolean value indicating whether the PSM has been validated.
  • frag_cov: Proportion of the theoretical fragments observed.

Log file ("log_XXX.csv"):

General information about the number of PSMs and peptidoforms validated/unvalidated at each step of the processing.

Obj file ("obj_XXX.pkl"):

Python's pickled ms_run class for visualization (WIP)

Contributing

To update

License

To update

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteoformquant-1.17.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

proteoformquant-1.17-py3-none-any.whl (73.3 kB view details)

Uploaded Python 3

File details

Details for the file proteoformquant-1.17.tar.gz.

File metadata

  • Download URL: proteoformquant-1.17.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.9

File hashes

Hashes for proteoformquant-1.17.tar.gz
Algorithm Hash digest
SHA256 111220b0f4ab898163c332c072fa3f5bccebd879404329fd3b52a5c62b68e835
MD5 465224d45a53fcdb9afb5d8af8e51fba
BLAKE2b-256 c530bed1d312fde309fe94c7e664386c21d456610ed7eaef1052b18a63c06f83

See more details on using hashes here.

File details

Details for the file proteoformquant-1.17-py3-none-any.whl.

File metadata

File hashes

Hashes for proteoformquant-1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 64824fa2ff0e5dbd5b5894e39eac27c4dc9d5db34fc75067f1ebda01a6c0c554
MD5 c8e70840f8160e7fbe7e817921e2566b
BLAKE2b-256 36119b69a2ad911ea1125d249a2f1c02338afbfdb1bdd990edb5cc6ee760f239

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page