Python package of the deTEL translation error detection pipeline from mass-spectrometry data

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
- Python :: 3.7

Project description

detecting Translation Error Landscape: deTEL
- deTEL Python package
empirical Translation Error Landscape: eTEL
- Command line options
- Example
multinomial Translation Error Landscape: mTEL
- Command line options
- Examples
  - Normal run
  - Bootstrapping datasets

detecting Translation Error Landscape: deTEL

deTEL is a simple pipeline that allows for the exploration of translation errors in mass-spectrometry data. deTEL consists of two components:

eTEL detects translation errors in mass-spectrometry data and explores the empirical translation error landscape.
mTEL is a model fitted to the translation errors detected by eTEL and describes the multinomial translation error landscape and extends the empirical translation error landscape. deTELpy packages these components into an easy to use python package.

deTEL Python package

pip install deTELpy

python -m deTEL # To run eTEL/mTEL with GUI

# To run eTEL with command line
python -m deTEL eTEL -f tests/resources/s228c_orf_cds.fasta -psm tests/resources/psm.tsv -o tests/output -decoy rev_ -p substitutions -tol 0.005

# To run mTEL with command line
python -m deTEL mTEL -f tests/resources/results_ionquant2 -r tests/resources/tRNA_count/yeast_tRNA_count.csv -o tests/output -s 250 -p 100 -c 4.2e-17 -t 10 -b 100 -nb -1 -a n

empirical Translation Error Landscape: eTEL

eTEL detects the empirical translation error landscape by first performing an open search using MSFragger (see: Perform open search). The second step is to extract translation errors using custom pythons scripts packaged (see: detect_substitutions). The output of eTEL can directly be used to fit the mTEL model (see: multinomial Translation Error Landscape: mTEL).

Command line options

-f: Fasta file of coding sequences located, matching the amino acid sequenes used for the open search.
-psm: path to psm.tsv file created by the open search (by philosopher).
-s: Folder to which output files are writen.
-p: Prefix used for output files
-decoy: identification prefix of decoy sequences (default: rev_)
-tol: m/z tolerance, used to filter DP–BP couples that resemble substitutions and exclude pairs that resemble known PTM (default: 0.005)

Example

The below command will detect substitutions in the specified psm.tsv created by the open search. Since we may have one experimenet and want to collect all files in a common folder, we specify a prefix (Experiment1) to identify the output belonging to each individual experiment.

python -m deTEL eTEL -f project/fasta/s228c_orf_cds.fasta -psm project/open_search_experiment1/psm.tsv -o project/results -p Experiment1

multinomial Translation Error Landscape: mTEL

<multinomial Translation Error Landscape (mTEL) mTEL uses observed translation errors to estimate a multinomial translation error landscape. mTEL is based on the competition of tRNAs and estimates affinity parameters between codon/anticodon pairs.

Command line options

-f: Folder with codon_count and error files.
-r: tRNA count file.
-o: Output folder.
-s: Number of samples of the chain.
-p: Number of posterior samples.
-c: Cell volume assumed (in cubic micrometers), Default: 4.2e-17 (approximate size of a yeast cell).
-t: Number to thin out chain by.
-b: Number of burn-in steps.
-nb: Number of sub-samplings performed.
-os: suffix added to output files (default: date).
-a: aggregate all datasets by summation (y,n) Default: No (n).

Examples

Normal run

We assume that the folder ecoli contains all needed pairs of *_codon_counts.csv and *_substitution_errors.csv files. A cell volume of 0.6e-18 is assumed for E. coli. We will collect 1000 samples after disgarding 1000 burn-in samples. In total, this run will perform (1000 + 1000) * 10 = 20000 steps. The last 200 samples will be used to estimate the posterior distributions of the indivisual parameters.

$ python -m deTEL mTEL -f ecoli -r tRNA_count/ecoli_tRNA_count.csv -c 0.6e-18 -o output/ecoli/ -s 1000 -p 200 -t 10 -b 1000

Bootstrapping datasets

We can bootstrap datasets as they can show a high variability. This allows us to explore parameter sensitivity and robustness. This run perfomes 20 resamplings with replacement of the datasets found in the folder yeast, keeping the number of datasets constant. For each resampling, the model will collect 250 samples after 10 burn-in steps and perform a total of (250 + 10) * 20 = 5200. The last 100 samples of each run will be used to estimate the posterior mean.

$ python -m deTEL mTEL -f yeast -r tRNA_count/yeast_tRNA_count.csv -c 4.2e-17 -o output/yeast/ -s 250 -p 100 -t 20 -b 10 -nb 20

Project details

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
- Python :: 3.7

Release history Release notifications | RSS feed

0.1.13

Mar 20, 2024

0.1.12

Mar 15, 2024

0.1.11

Mar 13, 2024

0.1.10

Mar 13, 2024

This version

0.1.8

Sep 12, 2023

0.1.7

Aug 8, 2023

0.1.6

Aug 8, 2023

0.1.5

Aug 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

deTELpy-0.1.8-py2.py3-none-any.whl (44.9 kB view hashes)

Uploaded Sep 12, 2023 Python 2 Python 3

Hashes for deTELpy-0.1.8-py2.py3-none-any.whl

Hashes for deTELpy-0.1.8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`12380601797ce4ff830cf43825fc7bcada6712fa791dc95eaca8a471570c1e72`
MD5	`6a0db9b0cfa96c6290d4eee97b3fe774`
BLAKE2b-256	`cd14b658611e53e5670ebb7dd44d48877a90956f1d4cbb085a8ddb78acfee2ef`