No project description provided
Project description
imspy - Python package for working with timsTOF raw data
Welcome to the imspy short introduction. This package is designed to work with timsTOF raw data files,
which are generated by the Bruker timsTOF mass spectrometers. The package provides a high-level API for
accessing raw data, as well as a chemistry module for working with peptide sequences.
The package also includes algorithms for ion mobility and retention time prediction,
as well as machine learning algorithms for data analysis. Want to see how to build full data processing pipelines
with imspy
? Check out the imspy_dda
and timsim
cmd tools.
Raw data access
Establish a connection to a timsTOF raw file and access data
import numpy as np
from imspy.timstof import TimsDataset
# you can use in-memory mode for faster access, but it requires more memory
tdf = TimsDataset("path/to/rawfolder.d", in_memory=False)
# show global meta data table
print(tdf.global_meta_data)
# show frame meta data
print(tdf.meta_data)
# get the first frame (bruker frame indices start at 1)
frame = tdf.get_tims_frame(1)
# you can also use indexing
frame = tdf[1]
# print data as pandas dataframe
frame.df()
# get all spectra in a tims frame (sorted by scan = ion mobility)
spectra = frame.to_tims_spectra()
# get a slice of multiple frames
frames = tdf.get_tims_slice(np.array([1, 2, 3]))
# or, by using slicing
frames = tdf[1:4]
DDA data
from imspy.timstof import TimsDatasetDDA
# read a DDA dataset
tdf = TimsDatasetDDA("path/to/rawfolder.d", in_memory=False)
# get raw data of precursors together with their fragment ions
dda_fragments = tdf.get_pasef_fragments()
# the timsTOF re-fragments precursors below a certain intensity threshold,
# you can aggregate the data for increased sensitivity like so:
dda_fragments_grouped = dda_fragments.groupby('precursor_id').agg({
'frame_id': 'first',
'time': 'first',
'precursor_id': 'first',
# this will sum up the raw data of all fragments with the same precursor_id
'raw_data': 'sum',
'scan_begin': 'first',
'scan_end': 'first',
'isolation_mz': 'first',
'isolation_width': 'first',
'collision_energy': 'first',
'largest_peak_mz': 'first',
'average_mz': 'first',
'monoisotopic_mz': 'first',
'charge': 'first',
'average_scan': 'first',
'intensity': 'first',
'parent_id': 'first',
})
# for convenience, you can calculate the inverse mobility
# of the precursor ion by finding the maximum intensity along the scan dimension
mobility = dda_fragments_grouped.apply(
lambda r: r.raw_data.get_inverse_mobility_along_scan_marginal(), axis=1
)
# add the inverse mobility to the grouped data as a new column
dda_fragments_grouped['mobility'] = mobility
DIA data
from imspy.timstof import TimsDatasetDIA
# read a DIA dataset
tdf = TimsDatasetDIA("path/to/rawfolder.d", in_memory=False)
The chemistry module
Basic usage
from imspy.chemistry.elements import ELEMENTAL_MONO_ISOTOPIC_MASSES, ELEMENTAL_ISOTOPIC_ABUNDANCES
from imspy.chemistry.sum_formula import SumFormula
# create a sum formula object that represents the molecule stachyose trihydrate
stachyose_trihydrate = SumFormula("C24H48O24")
# get the monoisotopic mass of the molecule
mono_mass = stachyose_trihydrate.monoisotopic_mass
# get the isotope distribution of the molecule, will be returned as an MzSpectrum object
mz_spec = stachyose_trihydrate.generate_isotope_distribution(charge=1)
This functionality is easily combined with the UNIMOD annotation database, which is included in the sagpy package.
from sagepy.core.unimod import modification_atomic_composition
from imspy.chemistry.sum_formula import SumFormula
# carbamidomethylation is a common modification that is annotated in the UNIMOD database
mods = modification_atomic_composition()
# get the atomic composition of carbamidomethylation
carbamidomethylation = mods["[UNIMOD:4]"]
# create a sum formula object that represents the molecule with carbamidomethylation
formula = SumFormula(''.join([key + str(value) for key, value in carbamidomethylation.items()]))
mono_mass = formula.monoisotopic_mass
Working with peptide sequences
from imspy.data.peptide import PeptideSequence
# create a peptide sequence object, might contain modifications
sequence = PeptideSequence("PEPTIDEC[UNIMOD:4]PEPTIDE")
# get the monoisotopic mass of the peptide
mono_mass = sequence.mono_isotopic_mass
# get the product ion series of the peptide sequence, e.g. b- and y-ions
b_ions, y_ions = product_ion_series = sequence.calculate_product_ion_series(
charge=2,
fragment_type='b',
)
# generate an isotopic distribution of the peptide product ion sequence with annotations, this will hold
# detailed information about every single peak in the spectrum like b- and y-ion annotations, charge, isotopic number, etc.
annotated_spectrum = sequence.calculate_mono_isotopic_product_ion_spectrum_annotated(
charge=2,
fragment_type='b'
)
Algorithms and machine learning
ion mobility and retention time prediction
from imspy.algorithm import (DeepPeptideIonMobilityApex, DeepChromatographyApex,
load_deep_ccs_predictor, load_deep_retention_time_predictor)
from imspy.algorithm.utility import load_tokenizer_from_resources
from imspy.chemistry.mobility import one_over_k0_to_ccs
# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]
# the retention time predictor model
rt_predictor = DeepChromatographyApex(load_deep_retention_time_predictor(),
load_tokenizer_from_resources("tokenizer-ptm"), verbose=True)
# predict retention times for peptide sequences
predicted_rt = rt_predictor.simulate_separation_times(sequences=sequences)
# the ion mobility predictor model
im_predictor = DeepPeptideIonMobilityApex(load_deep_ccs_predictor(),
load_tokenizer_from_resources("tokenizer-ptm"))
# predict ion mobilities for peptide sequences and translate them to collision cross sections
predicted_inverse_mobility = im_predictor.simulate_ion_mobilities(sequences=sequences, charges=charges, mz=mz_values)
ccs = [one_over_k0_to_ccs(inv_im, mz, charge) for inv_im, mz, charge in zip(predicted_inverse_mobility, mz_values, charges)]
Intensity prediction
We provide a wrapper for the Prosit intensity prediction model, timsTOF version, which can be used to predict the intensity of fragment ions. If you use this model, please give credit to the original authors.
from imspy.algorithm import Prosit2023TimsTofWrapper
# some example peptide sequences
sequences = ["PEPTIDE", "PEPTIDEC[UNIMOD:4]PEPTIDE"]
mz_values = [784.58, 1423.72]
charges = [1, 2]
# collision energies need to be calibrated, check out the Prosit documentation for more information or read the calibrate_collision_energies function
collision_energies = [20.5, 30.2]
# the Prosit model
prosit_model = Prosit2023TimsTofWrapper()
# predict expected ion intensities for peptide sequences
predicted_intensity = prosit_model.predict_intensities(
sequences=sequences,
charges=charges,
collision_energies=collision_energies,
# will return the flat 174 dimensional feature vector per sequence created by Prosit
flatten=True
)
Locality sensitive hashing
Locality sensitive hashing is a technique to find similar data points in high-dimensional spaces. We provide the option to cluster spectra based on their similarity using the LSH algorithm.
from imspy.timstof import TimsDatasetDDA
from imspy.algorithm.hashing import TimsHasher
# read a DDA dataset
tdf = TimsDatasetDDA("path/to/raw/folder.d", in_memory=False)
# read a frame
frame = tdf.get_tims_frame(1)
# create windows from frame
scans, indices, W = frame.to_dense_windows(
window_length=5,
resolution=1,
overlapping=True,
)
# create a TimsHasher object
hasher = TimsHasher(trials=256, len_trial=22, seed=42, num_dalton=5, resolution=1)
# calculate trials number of keys, each having len_tral bits for each window
K = hasher.calculate_keys(W)
Mixture models
</code></pre>
<h2>Pipeline: DDA data analysis (imspy_dda)</h2>
<p>After you successfully installed the package, you can use the <code>imspy_dda</code> command line tool to analyze DDA data.
This will print out a list of options and arguments that you can use to analyze your data:</p>
<pre lang="python"><code>imspy_dda --help
Pipeline: Synthetic raw data generation (timsim)
After you successfully installed the package, you can use the timsim
command line tool to generate synthetic raw data.
This will print out a list of options and arguments that you can use to generate synthetic raw data:
timsim --help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file imspy-0.2.34.tar.gz
.
File metadata
- Download URL: imspy-0.2.34.tar.gz
- Upload date:
- Size: 27.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40d784c3d56193d5c87869522ac26067b3a3f5e297453b14daf697f2c1a84513 |
|
MD5 | ce531d9370332166d68a523d69b6579c |
|
BLAKE2b-256 | 4d8554dfa56f98aa28e3c9eddb7c5c2661648540a8ade28972612adc76661c01 |
File details
Details for the file imspy-0.2.34-py3-none-any.whl
.
File metadata
- Download URL: imspy-0.2.34-py3-none-any.whl
- Upload date:
- Size: 27.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fed6b3dd54cad6d839857d93d6b5b58560eb6f32c8eaf3a53e6ca357e926c0da |
|
MD5 | 17cdd638edaf2448a5c77cdde7191704 |
|
BLAKE2b-256 | 5ae594ae8786cd52338f53835a30bf0c1ffdc1d9650fdb9c803d190feddef0c3 |