Skip to main content

Python utility for TMT-based proteomics

Project description

TMTCrunch is an open-source Python utility for tandem mass tag proteomics.

Overview

TMTCrunch is designed primarily to analyze products of alternative splicing in TMT (tandem mass tag) proteomics and phospho-proteomics data. TMTCrunch performs:

  • per channel normalization;
  • normalization across channels using inherent or virtual GIS channels as a reference;
  • optional grouping of PSMs in accordance with user defined rules;
  • global or per group FDR filtration;
  • calculation of abundance at any level: unmodified peptide, peptide with modifications, protein, gene.

TMTCrunch can be used with Sage search engine or with IdentiPy/Scavager.

TMTCrunch workflow

Installation

Installing from PyPI

The latest released version can be installed from the Python Package Index:

pip install tmtcrunch

Installing from source

The cutting edge version can be installed directly from the source repository:

pip install git+https://codeberg.org/makc/tmtcrunch.git

Alternatively, clone the repo and install the package in development mode:

git clone https://codeberg.org/makc/tmtcrunch.git
pip install --editable tmtcrunch

Dependencies

TMTCrunch relies on the following Python packages:

and it would use statistics functions from astropy package if available.

Command line options

usage: tmtcrunch [-h] [--cfg CFG] [--fasta FASTA] [--input-format {auto,scavager,sage}]
                 [--output-dir OUTPUT_DIR] [--output-prefix OUTPUT_PREFIX] [--phospho]
                 [--verbose {0,1,2}] [--show-config] [--version]
                 [fractions ...]

positional arguments:
  fractions             Scavager *_PSMs_full.tsv files or directories with Sage search results.

options:
  -h, --help            show this help message and exit
  --cfg CFG             Path to configuration file. Can be specified multiple times.
  --fasta FASTA         Path to protein fasta file for mapping protein to gene symbol.
  --input-format {auto,scavager,sage}
                        Format of input data. Supported: 'auto', 'scavager', 'sage'. Default is
                        'auto'
  --output-dir OUTPUT_DIR, --odir OUTPUT_DIR
                        Existing output directory. Default is current directory.
  --output-prefix OUTPUT_PREFIX, --oprefix OUTPUT_PREFIX
                        Prefix for output files. Default is 'tmtcrunch_'.
  --phospho             Enable common modifications for phospho-proteomics.
  --verbose {0,1,2}     Logging verbosity. Default is 1.
  --show-config         Show configuration and exit.
  --version             Output version information and exit.

Configuration files

TMTCrunch stores its configuration in TOML format.

Default TMTCrunch configuration:

# Specimen columns.
specimen_columns = []
# Global internal standard (GIS) columns (for multi batch experiments).
gis_columns = []
# Simulate GIS via selected specimen columns.
# Intended for singe batch experiments only!
simulate_gis = []

# Prefix of decoy proteins.
decoy_prefix = 'DECOY_'

# Path to protein fasta file for mapping protein to gene symbol.
fasta_file = ''

# List of column names from input files to save in the output.
keep_columns = []

# If true, perform PSM groupwise analysis.
groupwise = true

# Global false discovery rate. Can be overwritten per PSM group.
global_fdr = 0.01

# If true, respect peptide modifications and terminate analysis at peptide level.
with_modifications = false

# No modifications by default. Run TMTCrunch with --phospho argument
# to enable common modifications for phospho-proteomics.
[modification.universal]
[modification.selective]

# Keys below are only applicable if groupwise analysis is requested.
# Prefixes of target proteins. If not set, `target_prefixes` will be deduced
# from the prefixes of PSM groups.
# target_prefixes = ['alt_', 'canon_']

# Each PSM group is named after its subkey and defined by three keys:
# `descr` - group description
# `prefixes` - prefixes of target proteins
# `fdr` - groupwise false discovery rate. If not set, global FDR will be used.

# Isoform PSMs: protein group of each PSM consists of target proteins
# with 'alt_' prefix only and any decoy proteins.
[psm_group.isoform]
descr = 'Isoform PSMs'
prefixes = [['alt_']]
fdr = 0.05

# Canonical PSMs: protein group of each PSM consists of target proteins
# with 'canon_' prefix only and any decoy proteins.
[psm_group.canon]
descr = 'Canonical PSMs'
prefixes = [['canon_']]
fdr = 0.01

# Shared PSMs: protein group of each PSM consists both of
# 'canon_' and 'alt_' target proteins and any decoy proteins.
[psm_group.shared]
descr = 'Shared PSMs'
prefixes = [['canon_', 'alt_']]
fdr = 0.01

Additional configuration for phospho-proteomics (use --phospho argument to enable):

with_modifications = true

# Modifications can be either universal or selective. PSMs for modified
# peptides with any universal modification and the same pattern of selective
# modifications are treated together, PSMs for peptides with different pattern
# of selective modifications are treated separately.

[modification.universal.0]
name = "TMTpro"
# TMTpro 16plex
mass_delta = 304.207146
modX = "t"
# n-term, K
site = "^K"
variable = false

[modification.universal.1]
name = "TMTplex"
# TMT 6plex, 10plex, 11plex
mass_delta = 229.162932
modX = "t"
site = "^K"
variable = false

[modification.universal.2]
name = "Carboxyamidomethylation"
mass_delta = 57.021464
modX = "cam"
site = "C"
variable = false

[modification.universal.3]
name = "Oxidation"
mass_delta = 15.994915
modX = "ox"
site = "M"
variable = true

[modification.universal.4]
name = "Deamidation"
mass_delta = 0.984016
modX = "d"
site = "NQ"
variable = true

[modification.selective.1]
name = "Phosphorylation"
mass_delta = 79.966331
modX = "p"
site = "STY"

License

TMTCrunch is distributed under the three clause BSD License.

Related software

  • AA_stat - utility for amino acid residue modification analysis.
  • Pyteomics - Python framework for proteomics data analysis.
  • IdentiPy - search engine for bottom-up proteomics.
  • Sage - proteomics search engine & quantification tool.
  • Scavager - proteomics post-search validation tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmtcrunch-25.10.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmtcrunch-25.10-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file tmtcrunch-25.10.tar.gz.

File metadata

  • Download URL: tmtcrunch-25.10.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for tmtcrunch-25.10.tar.gz
Algorithm Hash digest
SHA256 aec1806127cde0e76a05323bfa5fcd4e8e67a14619e84ceb889572270265de78
MD5 05c9f213f830842735d42953eef5b5d2
BLAKE2b-256 2b761d44b80d30a9185bcb7c3ab1185278989640a7f72c6d4905d8009051c119

See more details on using hashes here.

File details

Details for the file tmtcrunch-25.10-py3-none-any.whl.

File metadata

  • Download URL: tmtcrunch-25.10-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for tmtcrunch-25.10-py3-none-any.whl
Algorithm Hash digest
SHA256 43582aa0d9abeb4b0ca977a7b2d0fc8fee82d49438fce1d63c7574b5d05799c8
MD5 ce5241a81aa13fab8333f276b4f24c2d
BLAKE2b-256 9a7d770d71ac88261d0594ef33aca113d95459884bddb45f80c976d3bd2a8658

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page