Skip to main content

No project description provided

Project description

imspy - Python package for working with timsTOF raw data

Welcome to the imspy short introduction. This package is designed to work with timsTOF raw data files, which are generated by the Bruker timsTOF mass spectrometers. The package provides a high-level API for accessing raw data, as well as a chemistry module for working with peptide sequences. The package also includes algorithms for ion mobility and retention time prediction, as well as machine learning algorithms for data analysis. Want to see how to build full data processing pipelines with imspy? Check out the imspy_dda and timsim cmd tools.

Raw data access

Establish a connection to a timsTOF raw file and access data

import numpy as np
from imspy.timstof import TimsDataset

# you can use in-memory mode for faster access, but it requires more memory
tdf = TimsDataset("path/to/rawfolder.d", in_memory=False)

# show global meta data table
print(tdf.global_meta_data)

# show frame meta data
print(tdf.meta_data)

# get the first frame (bruker frame indices start at 1)
frame = tdf.get_tims_frame(1)

# you can also use indexing
frame = tdf[1]

# print data as pandas dataframe
frame.df()

# get all spectra in a tims frame (sorted by scan = ion mobility)
spectra = frame.to_tims_spectra()

# get a slice of multiple frames
frames = tdf.get_tims_slice(np.array([1, 2, 3]))

# or, by using slicing
frames = tdf[1:4]

DDA data

from imspy.timstof import TimsDatasetDDA
# read a DDA dataset
tdf = TimsDatasetDDA("path/to/rawfolder.d", in_memory=False)

# get raw data of precursors together with their fragment ions
dda_fragments = tdf.get_pasef_fragments()

# the timsTOF re-fragments precursors below a certain intensity threshold,
# you can aggregate the data for increased sensitivity like so:
dda_fragments_grouped = dda_fragments.groupby('precursor_id').agg({
    'frame_id': 'first',
    'time': 'first',
    'precursor_id': 'first',
    # this will sum up the raw data of all fragments with the same precursor_id
    'raw_data': 'sum',
    'scan_begin': 'first',
    'scan_end': 'first',
    'isolation_mz': 'first',
    'isolation_width': 'first',
    'collision_energy': 'first',
    'largest_peak_mz': 'first',
    'average_mz': 'first',
    'monoisotopic_mz': 'first',
    'charge': 'first',
    'average_scan': 'first',
    'intensity': 'first',
    'parent_id': 'first',
})

# for convenience, you can calculate the inverse mobility 
# of the precursor ion by finding the maximum intensity along the scan dimension
mobility = dda_fragments_grouped.apply(
    lambda r: r.raw_data.get_inverse_mobility_along_scan_marginal(), axis=1
)

# add the inverse mobility to the grouped data as a new column
dda_fragments_grouped['mobility'] = mobility

DIA data

from imspy.timstof import TimsDatasetDIA
# read a DIA dataset
tdf = TimsDatasetDIA("path/to/rawfolder.d", in_memory=False)

The chemistry module

Basic usage

from imspy.chemistry.elements import ELEMENTAL_MONO_ISOTOPIC_MASSES, ELEMENTAL_ISOTOPIC_ABUNDANCES
from imspy.chemistry.sum_formula import SumFormula

# create a sum formula object that represents the molecule stachyose trihydrate
stachyose_trihydrate = SumFormula("C24H48O24")

# get the monoisotopic mass of the molecule
mono_mass = stachyose_trihydrate.monoisotopic_mass

# get the isotope distribution of the molecule, will be returned as an MzSpectrum object
mz_spec = stachyose_trihydrate.generate_isotope_distribution(charge=1)

This functionality is easily combined with the UNIMOD annotation database, which is included in the sagpy package.

from sagepy.core.unimod import modification_atomic_composition
from imspy.chemistry.sum_formula import SumFormula

# carbamidomethylation is a common modification that is annotated in the UNIMOD database
mods = modification_atomic_composition()

# get the atomic composition of carbamidomethylation
carbamidomethylation = mods["[UNIMOD:4]"]

# create a sum formula object that represents the molecule with carbamidomethylation
formula = SumFormula(''.join([key + str(value) for key, value in carbamidomethylation.items()]))
mono_mass = formula.monoisotopic_mass

Working with peptide sequences

from imspy.data.peptide import PeptideSequence

# create a peptide sequence object, might contain modifications
sequence = PeptideSequence("PEPTIDEC[UNIMOD:4]PEPTIDE")

# get the monoisotopic mass of the peptide
mono_mass = sequence.mono_isotopic_mass

# get the product ion series of the peptide sequence, e.g. b- and y-ions
b_ions, y_ions = product_ion_series = sequence.calculate_product_ion_series(
    charge=2,
    fragment_type='b',
)

# generate an isotopic distribution of the peptide product ion sequence with annotations, this will hold 
# detailed information about every single peak in the spectrum like b- and y-ion annotations, charge, isotopic number, etc.
annotated_spectrum = sequence.calculate_mono_isotopic_product_ion_spectrum_annotated(
    charge=2,
    fragment_type='b'
)

Algorithms and machine learning

ion mobility and retention time prediction

from imspy.algorithm import (DeepPeptideIonMobilityApex, DeepChromatographyApex, 
                             load_deep_ccs_predictor, load_deep_retention_time_predictor)
from imspy.algorithm.utility import load_tokenizer_from_resources
from imspy.chemistry.mobility import one_over_k0_to_ccs

# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]

# the retention time predictor model
rt_predictor = DeepChromatographyApex(load_deep_retention_time_predictor(),
                                      load_tokenizer_from_resources("tokenizer-ptm"), verbose=True)

# predict retention times for peptide sequences
predicted_rt = rt_predictor.simulate_separation_times(sequences=sequences)

# the ion mobility predictor model
im_predictor = DeepPeptideIonMobilityApex(load_deep_ccs_predictor(),
                                          load_tokenizer_from_resources("tokenizer-ptm"))

# predict ion mobilities for peptide sequences and translate them to collision cross sections
predicted_inverse_mobility = im_predictor.simulate_ion_mobilities(sequences=sequences, charges=charges, mz=mz_values)
ccs = [one_over_k0_to_ccs(inv_im, mz, charge) for inv_im, mz, charge in zip(predicted_inverse_mobility, mz_values, charges)]

Intensity prediction

We provide a wrapper for the Prosit intensity prediction model, timsTOF version, which can be used to predict the intensity of fragment ions. If you use this model, please give credit to the original authors.

from imspy.algorithm import Prosit2023TimsTofWrapper

# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]

# collision energies need to be calibrated, check out the Prosit documentation for more information or read the calibrate_collision_energies function
collision_energies = [20.5, 30.2]

# the Prosit model
prosit_model = Prosit2023TimsTofWrapper()

# predict expected ion intensities for peptide sequences
predicted_intensity = prosit_model.predict_intensities(
    sequences=sequences,
    charges=charges,
    collision_energies=collision_energies,
    # will return the flat 174 dimensional feature vector per sequence created by Prosit
    flatten=True
)

Locality sensitive hashing

Locality sensitive hashing is a technique to find similar data points in high-dimensional spaces. We provide the option to cluster spectra based on their similarity using the LSH algorithm.

from imspy.timstof import TimsDatasetDDA
from imspy.algorithm.hashing import TimsHasher

# read a DDA dataset
tdf = TimsDatasetDDA("path/to/raw/folder.d", in_memory=False)

# read a frame
frame = tdf.get_tims_frame(1)

# create windows from frame
scans, indices, W = frame.to_dense_windows(
    window_length=5,
    resolution=1,
    overlapping=True,
)

# create a TimsHasher object
hasher = TimsHasher(trials=256, len_trial=22, seed=42, num_dalton=5, resolution=1)

# calculate trials number of keys, each having len_tral bits for each window
K = hasher.calculate_keys(W)

Mixture models

</code></pre>
<h2>Pipeline: DDA data analysis (imspy_dda)</h2>
<p>After you successfully installed the package, you can use the <code>imspy_dda</code> command line tool to analyze DDA data.
This will print out a list of options and arguments that you can use to analyze your data:</p>
<pre lang="python"><code>imspy_dda --help

Pipeline: Synthetic raw data generation (timsim)

After you successfully installed the package, you can use the timsim command line tool to generate synthetic raw data. This will print out a list of options and arguments that you can use to generate synthetic raw data:

timsim --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imspy-0.3.10.tar.gz (29.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imspy-0.3.10-py3-none-any.whl (29.0 MB view details)

Uploaded Python 3

File details

Details for the file imspy-0.3.10.tar.gz.

File metadata

  • Download URL: imspy-0.3.10.tar.gz
  • Upload date:
  • Size: 29.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for imspy-0.3.10.tar.gz
Algorithm Hash digest
SHA256 46bbf3ca40e8b9e245c4e328b821a799774576a9763bcbb6f4de7cb247b96642
MD5 b5969274fb8a2331cafa4aa5469c0a18
BLAKE2b-256 a945937b91eae3ada045a42e1edbe81684f37a5c5a2389f4fa773d7114069eec

See more details on using hashes here.

File details

Details for the file imspy-0.3.10-py3-none-any.whl.

File metadata

  • Download URL: imspy-0.3.10-py3-none-any.whl
  • Upload date:
  • Size: 29.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.11.11 Linux/6.8.0-1017-azure

File hashes

Hashes for imspy-0.3.10-py3-none-any.whl
Algorithm Hash digest
SHA256 6cf03edf85829be8b44a3dd3966f75529e2a5d707f6879ff0f767b8d33c0eb84
MD5 361016635e7eed39ab06add7bd00eb58
BLAKE2b-256 76a76149a080bef22d726285859dff9060ac78e4a57db9591f5b6cf7a2ecc98d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page