Skip to main content

Python utility for TMT-based proteomics

Project description

TMTCrunch

TMTCrunch is an open-source Python utility for tandem mass tag proteomics.

Overview

TMTCrunch is designed primarily to analyze products of alternative splicing in TMT (tandem mass tag) proteomics and phospho-proteomics data. TMTCrunch performs:

  • per channel normalization;
  • normalization across channels using inherent or virtual GIS channels as a reference;
  • optional grouping of PSMs in accordance with user defined rules;
  • global or per group FDR filtration;
  • calculation of abundance at any level: unmodified peptide, peptide with modifications, protein, gene.

TMTCrunch can be used with Sage search engine or with IdentiPy/Scavager.

Installation

Installing from PyPI

The latest released version can be installed from the Python Package Index:

pip install tmtcrunch

Installing from source

The cutting edge version can be installed directly from the source repository:

pip install git+https://codeberg.org/makc/tmtcrunch.git

Alternatively, clone the repo and install the package in development mode:

git clone https://codeberg.org/makc/tmtcrunch.git
pip install --editable tmtcrunch

Dependencies

TMTCrunch relies on the following Python packages:

and it would use statistics functions from astropy package if available.

Command line options

usage: tmtcrunch [-h] [--cfg CFG] [--fasta FASTA] [--input-format {auto,scavager,sage}]
                 [--output-dir OUTPUT_DIR] [--output-prefix OUTPUT_PREFIX] [--phospho]
                 [--verbose {0,1,2}] [--show-config] [--version]
                 [fractions ...]

positional arguments:
  fractions             Scavager *_PSMs_full.tsv files or directories with Sage search results.

options:
  -h, --help            show this help message and exit
  --cfg CFG             Path to configuration file. Can be specified multiple times.
  --fasta FASTA         Path to protein fasta file for mapping protein to gene symbol.
  --input-format {auto,scavager,sage}
                        Format of input data. Supported: 'auto', 'scavager', 'sage'. Default is
                        'auto'
  --output-dir OUTPUT_DIR, --odir OUTPUT_DIR
                        Existing output directory. Default is current directory.
  --output-prefix OUTPUT_PREFIX, --oprefix OUTPUT_PREFIX
                        Prefix for output files. Default is 'tmtcrunch_'.
  --phospho             Enable common modifications for phospho-proteomics.
  --verbose {0,1,2}     Logging verbosity. Default is 1.
  --show-config         Show configuration and exit.
  --version             Output version information and exit.

Configuration files

TMTCrunch stores its configuration in TOML format.

Default TMTCrunch configuration:

# Specimen columns.
specimen_columns = []
# Global internal standard (GIS) columns (for multi batch experiments).
gis_columns = []
# Simulate GIS via selected specimen columns.
# Intended for singe batch experiments only!
simulate_gis = []

# Prefix of decoy proteins.
decoy_prefix = 'DECOY_'

# Path to protein fasta file for mapping protein to gene symbol.
fasta_file = ''

# List of column names from input files to save in the output.
keep_columns = []

# If true, perform PSM groupwise analysis.
groupwise = true

# Global false discovery rate. Can be overwritten per PSM group.
global_fdr = 0.01

# If true, respect peptide modifications and terminate analysis at peptide level.
with_modifications = false

# No modifications by default. Run TMTCrunch with --phospho argument
# to enable common modifications for phospho-proteomics.
[modification.universal]
[modification.selective]

# Keys below are only applicable if groupwise analysis is requested.
# Prefixes of target proteins. If not set, `target_prefixes` will be deduced
# from the prefixes of PSM groups.
# target_prefixes = ['alt_', 'canon_']

# Each PSM group is named after its subkey and defined by three keys:
# `descr` - group description
# `prefixes` - prefixes of target proteins
# `fdr` - groupwise false discovery rate. If not set, global FDR will be used.

# Isoform PSMs: protein group of each PSM consists of target proteins
# with 'alt_' prefix only and any decoy proteins.
[psm_group.isoform]
descr = 'Isoform PSMs'
prefixes = [['alt_']]
fdr = 0.05

# Canonical PSMs: protein group of each PSM consists of target proteins
# with 'canon_' prefix only and any decoy proteins.
[psm_group.canon]
descr = 'Canonical PSMs'
prefixes = [['canon_']]
fdr = 0.01

# Shared PSMs: protein group of each PSM consists both of
# 'canon_' and 'alt_' target proteins and any decoy proteins.
[psm_group.shared]
descr = 'Shared PSMs'
prefixes = [['canon_', 'alt_']]
fdr = 0.01

Additional configuration for phospho-proteomics (use --phospho argument to enable):

with_modifications = true

# Modifications can be either universal or selective. PSMs for modified
# peptides with any universal modification and the same pattern of selective
# modifications are treated together, PSMs for peptides with different pattern
# of selective modifications are treated separately.

[modification.universal.1]
name = "Carboxyamidomethylation"
mass = "160.031"
modX = "camC"

[modification.universal.2]
name = "TMTplex at K"
mass = "357.258"
modX = "tK"

[modification.universal.3]
name = "TMTplex n-term"
mass = "230.171"
modX = "t-"

[modification.universal.4]
name = "Oxidation"
mass = "147.035"
modX = "oxM"

[modification.selective.5]
name = "Phosphorylation S"
mass = "166.998"
modX = "pS"

[modification.selective.6]
name = "Phosphorylation T"
mass = "181.014"
modX = "pT"

[modification.selective.7]
name = "Phosphorylation Y"
mass = "243.030"
modX = "pY"

License

TMTCrunch is distributed under the three clause BSD License.

Related software

  • Pyteomics - Python framework for proteomics data analysis.
  • IdentiPy - search engine for bottom-up proteomics.
  • Sage - proteomics search engine & quantification tool.
  • Scavager - proteomics post-search validation tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmtcrunch-25.6.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmtcrunch-25.6-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file tmtcrunch-25.6.tar.gz.

File metadata

  • Download URL: tmtcrunch-25.6.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tmtcrunch-25.6.tar.gz
Algorithm Hash digest
SHA256 771440bab5bfedb08498a1e9a0aa8e00ae928bbded5dbff70f0950737117df9d
MD5 315cb660cb760862fd31a4843309a108
BLAKE2b-256 8fe19eac428fa393285eedbb52d380ae946eb3f2ce51abbb5e76dbbd83321d81

See more details on using hashes here.

File details

Details for the file tmtcrunch-25.6-py3-none-any.whl.

File metadata

  • Download URL: tmtcrunch-25.6-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tmtcrunch-25.6-py3-none-any.whl
Algorithm Hash digest
SHA256 566c533a499b3b8a2f278ce09dda33c98899e904d17b2edf2ce4a53845e4f409
MD5 e2b2bd3ccb937c10e226ba13979e11a0
BLAKE2b-256 861581f37bf47b7d694cd708ea09bba231cf3ff335c9afdf4c14c0d3aa746c07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page