A python command line tool for the quantification of peptidoform/proteoforms
Project description
Proteoformquant
Proteoformquant is a Python tool for quantitative analysis of proteoforms from mass spectrometry data.
Setup/Installation
Via PyPi repository (Recommended)
1. Install Proteoformquant package
The simplest way to use Proteformquant is to downloading it as a package from the PyPi repository using pip.
pip install proteoformquant
You should then be able to run Proteoformquant by running the following line in a terminal
2. Run Proteoformquant
proteoformquant
Access help by running
proteoformquant -h
More information on how to use proteoformquant is avaible in the 'usage' section of this document.
Via github (Alternative)
You also have the possibility to clone the repository from github and manually install the dependencies. You will need to create a Conda environment.
1. Install Conda/Mamba
If not already done, install Conda (https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html#regular-installation)
If you wish to use Mamba to create the Conda environment (faster) you can install mamba by running the following command in a terminal:
conda install mamba -n base -c conda-forge
2. Clone proteoformquant repository
git clone https://github.com/arthur-grimaud/Proteoformquant.git
3. Create and activate the environment
Next, create the environment using either Conda or Mamba by running the following command in the folder where 'environment.yml' is located
# With Conda
conda env create --file environment.yml
#With Manba
mamba env create --file environment.yml
you should now be able to activate the environment with:
mamba activate pfq-env
4. Run Proteoformquant
Run Proteoformquant by running the 'proteoformquant.py' script in 'src/proteoformquant'
python3 src/proteoformquant/proteoformquant.py
Usage
(n.b the command line listed here are given for the installation of Proteoformquant as a package. you will need to adapt the commands if you use the second installation method)
Proteoform requires 3 input files:
- spectra file (.mgf or .mzml)
- indentification file (.mzid) (Recommended: MSAmanda output)
- a parameter file (.json)
A parameter file can be generated by running.
proteoformquant -cp
If you do not change the name or location of the parameter file you can run proteoformquant as follow
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf
by default this will create an output file 'output/' in the local directory. If you wish to change that use the -d parameter
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder
similarly you can change the outfile name with the -o parameter
proteoformquant -i path/to/identification/file.mzid -s path/to/spectra/file.mgf -d path/to/my_output_folder -o output_file_1
5. Output Format
Quantification File
Below is the description of each column present in the quantification table ("quant_XXX.csv")
- proforma: Peptidoform in Proforma Nomenclature
- sequence: Peptidoform amino acid sequence.
- brno: Modification notation indicating the type and location of post-translational modifications on the amino acid sequence.
- protein: Accession numbers of the proteins the peptidoform is associated with, delimited by a semicolon if multiple.
- intensity: Absolute intensity value of peptidoforms after quantification in chimeric spectra.
- intensity_r1: Absolute intensity value of peptidoforms using only Rank 1 PSMs.
- linked_psm: The total number of PSMs corresponding to a peptidoform.
- linked_psm_validated: The number of PSMs validated after quantification in chimeric spectra.
- rt_peak: The retention time value in seconds at the apex of the elution profile.
- auc: The area under the curve, which can be used for quantification but is not recommended.
- ambiguity: The number of spectra the peptidoform is identified in where site-determining ions were missing to confidently validate all peptidoforms.
Additional output files
PSM file ("psm_XXX.csv"):
- spec: Index or identifier of each spectrum.
- rank: Rank of the PSM, with a lower number indicating higher confidence.
- sequence: Amino acid sequence of the peptide/protein.
- brno: Modifications in brno nomenclature.
- proforma: Peptidoform in Proforma Nomenclature.
- score: Match score of the peptide spectrum match (from the identification file provided).
- validated: Boolean value indicating whether the PSM has been validated.
- frag_cov: Proportion of the theoretical fragments observed.
Log file ("log_XXX.csv"):
General information about the number of PSMs and peptidoforms validated/unvalidated at each step of the processing.
Obj file ("obj_XXX.pkl"):
Python's pickled ms_run class for visualization (WIP)
Contributing
To update
License
To update
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file proteoformquant-1.17.tar.gz
.
File metadata
- Download URL: proteoformquant-1.17.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 111220b0f4ab898163c332c072fa3f5bccebd879404329fd3b52a5c62b68e835 |
|
MD5 | 465224d45a53fcdb9afb5d8af8e51fba |
|
BLAKE2b-256 | c530bed1d312fde309fe94c7e664386c21d456610ed7eaef1052b18a63c06f83 |
File details
Details for the file proteoformquant-1.17-py3-none-any.whl
.
File metadata
- Download URL: proteoformquant-1.17-py3-none-any.whl
- Upload date:
- Size: 73.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64824fa2ff0e5dbd5b5894e39eac27c4dc9d5db34fc75067f1ebda01a6c0c554 |
|
MD5 | c8e70840f8160e7fbe7e817921e2566b |
|
BLAKE2b-256 | 36119b69a2ad911ea1125d249a2f1c02338afbfdb1bdd990edb5cc6ee760f239 |