Skip to main content

No project description provided

Project description

imspy - Python package for working with timsTOF raw data

Welcome to the imspy short introduction. This package is designed to work with timsTOF raw data files, which are generated by the Bruker timsTOF mass spectrometers. The package provides a high-level API for accessing raw data, as well as a chemistry module for working with peptide sequences. The package also includes algorithms for ion mobility and retention time prediction, as well as machine learning algorithms for data analysis. Want to see how to build full data processing pipelines with imspy? Check out the imspy_dda and timsim cmd tools.

Raw data access

Establish a connection to a timsTOF raw file and access data

import numpy as np
from imspy.timstof import TimsDataset

# you can use in-memory mode for faster access, but it requires more memory
tdf = TimsDataset("path/to/rawfolder.d", in_memory=False)

# show global meta data table
print(tdf.global_meta_data)

# show frame meta data
print(tdf.meta_data)

# get the first frame (bruker frame indices start at 1)
frame = tdf.get_tims_frame(1)

# you can also use indexing
frame = tdf[1]

# print data as pandas dataframe
frame.df()

# get all spectra in a tims frame (sorted by scan = ion mobility)
spectra = frame.to_tims_spectra()

# get a slice of multiple frames
frames = tdf.get_tims_slice(np.array([1, 2, 3]))

# or, by using slicing
frames = tdf[1:4]

DDA data

from imspy.timstof import TimsDatasetDDA
# read a DDA dataset
tdf = TimsDatasetDDA("path/to/rawfolder.d", in_memory=False)

# get raw data of precursors together with their fragment ions
dda_fragments = tdf.get_pasef_fragments()

# the timsTOF re-fragments precursors below a certain intensity threshold,
# you can aggregate the data for increased sensitivity like so:
dda_fragments_grouped = dda_fragments.groupby('precursor_id').agg({
    'frame_id': 'first',
    'time': 'first',
    'precursor_id': 'first',
    # this will sum up the raw data of all fragments with the same precursor_id
    'raw_data': 'sum',
    'scan_begin': 'first',
    'scan_end': 'first',
    'isolation_mz': 'first',
    'isolation_width': 'first',
    'collision_energy': 'first',
    'largest_peak_mz': 'first',
    'average_mz': 'first',
    'monoisotopic_mz': 'first',
    'charge': 'first',
    'average_scan': 'first',
    'intensity': 'first',
    'parent_id': 'first',
})

# for convenience, you can calculate the inverse mobility 
# of the precursor ion by finding the maximum intensity along the scan dimension
mobility = dda_fragments_grouped.apply(
    lambda r: r.raw_data.get_inverse_mobility_along_scan_marginal(), axis=1
)

# add the inverse mobility to the grouped data as a new column
dda_fragments_grouped['mobility'] = mobility

DIA data

from imspy.timstof import TimsDatasetDIA
# read a DIA dataset
tdf = TimsDatasetDIA("path/to/rawfolder.d", in_memory=False)

The chemistry module

Basic usage

from imspy.chemistry.elements import ELEMENTAL_MONO_ISOTOPIC_MASSES, ELEMENTAL_ISOTOPIC_ABUNDANCES
from imspy.chemistry.sum_formula import SumFormula

# create a sum formula object that represents the molecule stachyose trihydrate
stachyose_trihydrate = SumFormula("C24H48O24")

# get the monoisotopic mass of the molecule
mono_mass = stachyose_trihydrate.monoisotopic_mass

# get the isotope distribution of the molecule, will be returned as an MzSpectrum object
mz_spec = stachyose_trihydrate.generate_isotope_distribution(charge=1)

This functionality is easily combined with the UNIMOD annotation database, which is included in the sagpy package.

from sagepy.core.unimod import modification_atomic_composition
from imspy.chemistry.sum_formula import SumFormula

# carbamidomethylation is a common modification that is annotated in the UNIMOD database
mods = modification_atomic_composition()

# get the atomic composition of carbamidomethylation
carbamidomethylation = mods["[UNIMOD:4]"]

# create a sum formula object that represents the molecule with carbamidomethylation
formula = SumFormula(''.join([key + str(value) for key, value in carbamidomethylation.items()]))
mono_mass = formula.monoisotopic_mass

Working with peptide sequences

from imspy.data.peptide import PeptideSequence

# create a peptide sequence object, might contain modifications
sequence = PeptideSequence("PEPTIDEC[UNIMOD:4]PEPTIDE")

# get the monoisotopic mass of the peptide
mono_mass = sequence.mono_isotopic_mass

# get the product ion series of the peptide sequence, e.g. b- and y-ions
b_ions, y_ions = product_ion_series = sequence.calculate_product_ion_series(
    charge=2,
    fragment_type='b',
)

# generate an isotopic distribution of the peptide product ion sequence with annotations, this will hold 
# detailed information about every single peak in the spectrum like b- and y-ion annotations, charge, isotopic number, etc.
annotated_spectrum = sequence.calculate_mono_isotopic_product_ion_spectrum_annotated(
    charge=2,
    fragment_type='b'
)

Algorithms and machine learning

ion mobility and retention time prediction

from imspy.algorithm import (DeepPeptideIonMobilityApex, DeepChromatographyApex, 
                             load_deep_ccs_predictor, load_deep_retention_time_predictor)
from imspy.algorithm.utility import load_tokenizer_from_resources
from imspy.chemistry.mobility import one_over_k0_to_ccs

# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]

# the retention time predictor model
rt_predictor = DeepChromatographyApex(load_deep_retention_time_predictor(),
                                      load_tokenizer_from_resources("tokenizer-ptm"), verbose=True)

# predict retention times for peptide sequences
predicted_rt = rt_predictor.simulate_separation_times(sequences=sequences)

# the ion mobility predictor model
im_predictor = DeepPeptideIonMobilityApex(load_deep_ccs_predictor(),
                                          load_tokenizer_from_resources("tokenizer-ptm"))

# predict ion mobilities for peptide sequences and translate them to collision cross sections
predicted_inverse_mobility = im_predictor.simulate_ion_mobilities(sequences=sequences, charges=charges, mz=mz_values)
ccs = [one_over_k0_to_ccs(inv_im, mz, charge) for inv_im, mz, charge in zip(predicted_inverse_mobility, mz_values, charges)]

Intensity prediction

We provide a wrapper for the Prosit intensity prediction model, timsTOF version, which can be used to predict the intensity of fragment ions. If you use this model, please give credit to the original authors.

from imspy.algorithm import Prosit2023TimsTofWrapper

# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]

# collision energies need to be calibrated, check out the Prosit documentation for more information or read the calibrate_collision_energies function
collision_energies = [20.5, 30.2]

# the Prosit model
prosit_model = Prosit2023TimsTofWrapper()

# predict expected ion intensities for peptide sequences
predicted_intensity = prosit_model.predict_intensities(
    sequences=sequences,
    charges=charges,
    collision_energies=collision_energies,
    # will return the flat 174 dimensional feature vector per sequence created by Prosit
    flatten=True
)

Locality sensitive hashing

Locality sensitive hashing is a technique to find similar data points in high-dimensional spaces. We provide the option to cluster spectra based on their similarity using the LSH algorithm.

from imspy.timstof import TimsDatasetDDA
from imspy.algorithm.hashing import TimsHasher

# read a DDA dataset
tdf = TimsDatasetDDA("path/to/raw/folder.d", in_memory=False)

# read a frame
frame = tdf.get_tims_frame(1)

# create windows from frame
scans, indices, W = frame.to_dense_windows(
    window_length=5,
    resolution=1,
    overlapping=True,
)

# create a TimsHasher object
hasher = TimsHasher(trials=256, len_trial=22, seed=42, num_dalton=5, resolution=1)

# calculate trials number of keys, each having len_tral bits for each window
K = hasher.calculate_keys(W)

Mixture models

</code></pre>
<h2>Pipeline: DDA data analysis (imspy_dda)</h2>
<p>After you successfully installed the package, you can use the <code>imspy_dda</code> command line tool to analyze DDA data.
This will print out a list of options and arguments that you can use to analyze your data:</p>
<pre lang="python"><code>imspy_dda --help

Pipeline: Synthetic raw data generation (timsim)

After you successfully installed the package, you can use the timsim command line tool to generate synthetic raw data. This will print out a list of options and arguments that you can use to generate synthetic raw data:

timsim --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imspy-0.2.32.tar.gz (27.9 MB view hashes)

Uploaded Source

Built Distribution

imspy-0.2.32-py3-none-any.whl (27.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page