Skip to main content

Python utility for TMT-based proteomics

Project description

TMTCrunch

TMTCrunch is an open-source Python utility for tandem mass tag proteomics.

Overview

TMTCrunch is designed primarily to analyze products of alternative splicing in TMT (tandem mass tag) proteomics and phospho-proteomics data. TMTCrunch performs:

  • per channel normalization;
  • normalization across channels using inherent or virtual GIS channels as a reference;
  • optional grouping of PSMs in accordance with user defined rules;
  • global or per group FDR filtration;
  • calculation of abundance at any level: unmodified peptide, peptide with modifications, protein, gene.

TMTCrunch can be used with Sage search engine or with IdentiPy/Scavager.

Installation

Installing from PyPI

The latest released version can be installed from the Python Package Index:

pip install tmtcrunch

Installing from source

The cutting edge version can be installed directly from the source repository:

pip install git+https://codeberg.org/makc/tmtcrunch.git

Alternatively, clone the repo and install the package in development mode:

git clone https://codeberg.org/makc/tmtcrunch.git
pip install --editable tmtcrunch

Dependencies

TMTCrunch relies on the following Python packages:

and it would use statistics functions from astropy package if available.

Command line options

usage: tmtcrunch [-h] [--cfg CFG] [--fasta FASTA] [--input-format {auto,scavager,sage}]
                 [--output-dir OUTPUT_DIR] [--output-prefix OUTPUT_PREFIX] [--phospho]
                 [--verbose {0,1,2}] [--show-config] [--version]
                 [fractions ...]

positional arguments:
  fractions             Scavager *_PSMs_full.tsv files or directories with Sage search results.

options:
  -h, --help            show this help message and exit
  --cfg CFG             Path to configuration file. Can be specified multiple times.
  --fasta FASTA         Path to protein fasta file for mapping protein to gene symbol.
  --input-format {auto,scavager,sage}
                        Format of input data. Supported: 'auto', 'scavager', 'sage'. Default is
                        'auto'
  --output-dir OUTPUT_DIR, --odir OUTPUT_DIR
                        Existing output directory. Default is current directory.
  --output-prefix OUTPUT_PREFIX, --oprefix OUTPUT_PREFIX
                        Prefix for output files. Default is 'tmtcrunch_'.
  --phospho             Enable common modifications for phospho-proteomics.
  --verbose {0,1,2}     Logging verbosity. Default is 1.
  --show-config         Show configuration and exit.
  --version             Output version information and exit.

Configuration files

TMTCrunch stores its configuration in TOML format.

Default TMTCrunch configuration:

# Specimen columns.
specimen_columns = []
# Global internal standard (GIS) columns (for multi batch experiments).
gis_columns = []
# Simulate GIS via selected specimen columns.
# Intended for singe batch experiments only!
simulate_gis = []

# Prefix of decoy proteins.
decoy_prefix = 'DECOY_'

# Path to protein fasta file for mapping protein to gene symbol.
fasta_file = ''

# List of column names from input files to save in the output.
keep_columns = []

# If true, perform PSM groupwise analysis.
groupwise = true

# Global false discovery rate. Can be overwritten per PSM group.
global_fdr = 0.01

# If true, respect peptide modifications and terminate analysis at peptide level.
with_modifications = false

# No modifications by default. Run TMTCrunch with --phospho argument
# to enable common modifications for phospho-proteomics.
[modification.universal]
[modification.selective]

# Keys below are only applicable if groupwise analysis is requested.
# Prefixes of target proteins. If not set, `target_prefixes` will be deduced
# from the prefixes of PSM groups.
# target_prefixes = ['alt_', 'canon_']

# Each PSM group is named after its subkey and defined by three keys:
# `descr` - group description
# `prefixes` - prefixes of target proteins
# `fdr` - groupwise false discovery rate. If not set, global FDR will be used.

# Isoform PSMs: protein group of each PSM consists of target proteins
# with 'alt_' prefix only and any decoy proteins.
[psm_group.isoform]
descr = 'Isoform PSMs'
prefixes = [['alt_']]
fdr = 0.05

# Canonical PSMs: protein group of each PSM consists of target proteins
# with 'canon_' prefix only and any decoy proteins.
[psm_group.canon]
descr = 'Canonical PSMs'
prefixes = [['canon_']]
fdr = 0.01

# Shared PSMs: protein group of each PSM consists both of
# 'canon_' and 'alt_' target proteins and any decoy proteins.
[psm_group.shared]
descr = 'Shared PSMs'
prefixes = [['canon_', 'alt_']]
fdr = 0.01

Additional configuration for phospho-proteomics (use --phospho argument to enable):

with_modifications = true

# Modifications can be either universal or selective. PSMs for modified
# peptides with any universal modification and the same pattern of selective
# modifications are treated together, PSMs for peptides with different pattern
# of selective modifications are treated separately.

[modification.universal.1]
name = "Carboxyamidomethylation"
mass = "160.031"
modX = "camC"

[modification.universal.2]
name = "TMTplex at K"
mass = "357.258"
modX = "tK"

[modification.universal.3]
name = "TMTplex n-term"
mass = "230.171"
modX = "t-"

[modification.universal.4]
name = "Oxidation"
mass = "147.035"
modX = "oxM"

[modification.selective.5]
name = "Phosphorylation S"
mass = "166.998"
modX = "pS"

[modification.selective.6]
name = "Phosphorylation T"
mass = "181.014"
modX = "pT"

[modification.selective.7]
name = "Phosphorylation Y"
mass = "243.030"
modX = "pY"

License

TMTCrunch is distributed under the three clause BSD License.

Related software

  • Pyteomics - Python framework for proteomics data analysis.
  • IdentiPy - search engine for bottom-up proteomics.
  • Sage - proteomics search engine & quantification tool.
  • Scavager - proteomics post-search validation tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmtcrunch-25.5.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tmtcrunch-25.5-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file tmtcrunch-25.5.tar.gz.

File metadata

  • Download URL: tmtcrunch-25.5.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tmtcrunch-25.5.tar.gz
Algorithm Hash digest
SHA256 4a5a0f64da63e187f26488e25500da6a06f92b10de3f2a0e1f31481ce6f1d2b3
MD5 75d33d5b61f50e048a77457bb938a9bc
BLAKE2b-256 3abc0271af92a24f5feac1228d6c9f880f0e5d2070ee1f6dd17c97ba20d03eec

See more details on using hashes here.

File details

Details for the file tmtcrunch-25.5-py3-none-any.whl.

File metadata

  • Download URL: tmtcrunch-25.5-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for tmtcrunch-25.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ab37f14b19912f1a51a54c35ea4d2c258f636975b5fb272e5ef90c9607a1c14c
MD5 675c7f565329f262d901374213ba901a
BLAKE2b-256 af0c58d8f55251119b02b5823855ffefc8f469a1df7575f11ced3d146adfe4cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page