Skip to main content

Mass Accuracy Recalibration System

Project description

MARS: Mass Accuracy Recalibration System

PyPI version Python versions License

Mass recalibration tool for DIA mass spectrometry data from the ThermoFisher Stellar.

Overview

Mars learns m/z calibration corrections from spectral library fragment matches. The XGBoost model accounts for:

  • Fragment m/z: Mass-dependent calibration bias
  • Peak intensity: Higher intensity peaks provide more reliable calibration
  • Absolute time: Calibration drift over the acquisition run
  • Spectrum TIC: Space charge effects from high ion current
  • Ion injection time: Signal accumulation duration effects
  • Precursor m/z: DIA isolation window-specific effects
  • RF temperatures: Thermal effects from RF amplifier (RFA2) and electronics (RFC2)

How It Works

  1. Fragment matching: For each DIA MS2 spectrum, Mars finds library peptides where:

    • The precursor m/z falls within the DIA isolation window
    • The spectrum RT is within the peptide's elution window
  2. Peak selection: For each expected fragment, Mars selects the most intense peak within the m/z tolerance (not the closest), filtering for minimum intensity

  3. Model training: Each matched fragment becomes a training point with up to 16 features (see Model Features) and target: delta_mz

  4. Calibration: The trained model predicts m/z corrections for all peaks in the mzML

Installation

From PyPI (recommended)

pip install mars-ms

From source

git clone https://github.com/maccoss/mars.git
cd mars
pip install -e .

Requirements: Python 3.10+, pyteomics, xgboost, numpy, pandas, matplotlib, seaborn, click

Usage

With PRISM CSV (Recommended)

Use a CSV file created using this Skyline report for accurate RT windows:

mars calibrate \
  --mzml data.mzML \
  --prism-csv prism_report.csv \
  --tolerance 0.2 \
  --min-intensity 500 \
  --max-isolation-window 5.0 \
  --output-dir output/

Basic Usage

mars calibrate --mzml data.mzML --library library.blib --output-dir output/

Batch Processing

# Multiple files with wildcard
mars calibrate --mzml "*.mzML" --library library.blib --output-dir output/

# All files in directory
mars calibrate --mzml-dir /path/to/data/ --library library.blib --output-dir output/

Options

Option Default Description
--mzml - Path to mzML file or glob pattern
--mzml-dir - Directory containing mzML files
--library - Path to blib spectral library (ignored if using PRISM Skyline Report)
--prism-csv - PRISM Skyline CSV with Start/End Time columns
--tolerance 0.7 m/z tolerance for matching (Th), ignored if --tolerance-ppm is set
--tolerance-ppm - m/z tolerance for matching in ppm (e.g., 10 for Astral), overrides --tolerance
--min-intensity 500 Minimum peak intensity for matching
--max-isolation-window - Maximum isolation window width (m/z) to include
--temperature-dir - Directory with RF temperature CSV files
--output-dir . Output directory
--model-path - Path to save/load calibration model
--no-recalibrate - Only train model, don't write mzML

RT Window Behavior

  • With --prism-csv: Uses exact Start Time and End Time from Skyline
  • Without --prism-csv: Uses ±5 seconds around the blib library RT

Isolation Window Filtering

Some DIA methods use wide isolation windows (e.g., 20-30 m/z) that may reduce calibration accuracy. Use --max-isolation-window to exclude these:

# Exclude windows wider than 5 m/z
mars calibrate --mzml data.mzML --prism-csv report.csv --max-isolation-window 5.0

This filters spectra during both model training and mzML recalibration. Typical narrow DIA windows (~1 m/z) are retained.

Output Files

File Description
{input}-mars.mzML Recalibrated mzML file
mars_model.pkl Trained XGBoost calibration model
mars_qc_histogram.png Delta m/z distribution (before/after)
mars_qc_heatmap.png 2D heatmap (RT × m/z, color = delta)
mars_qc_intensity_vs_error.png Intensity vs mass error hexbin
mars_qc_rt_vs_error.png RT vs mass error hexbin
mars_qc_mz_vs_error.png Fragment m/z vs mass error hexbin
mars_qc_tic_vs_error.png TIC vs mass error hexbin
mars_qc_injection_time_vs_error.png Injection time vs mass error hexbin
mars_qc_tic_injection_time_vs_error.png TIC×injection time vs mass error hexbin
mars_qc_fragment_ions_vs_error.png Fragment ions vs mass error hexbin
mars_qc_rfa2_temperature_vs_error.png RFA2 temperature vs error (if available)
mars_qc_rfc2_temperature_vs_error.png RFC2 temperature vs error (if available)
mars_qc_feature_importance.png Model feature importance
mars_qc_summary.txt Calibration statistics

Model Features

The XGBoost model uses up to 16 features to predict m/z corrections:

  1. precursor_mz - DIA isolation window center
  2. fragment_mz - Fragment m/z being calibrated
  3. absolute_time - Time relative to first acquisition (seconds)
  4. log_tic - Log10 of spectrum total ion current
  5. log_intensity - Log10 of peak intensity
  6. injection_time - Ion injection time (seconds)
  7. tic_injection_time - TIC × injection time product
  8. fragment_ions - Fragment intensity × injection time (total ions, not rate)
  9. ions_above_0_1 - Total ions in (X+0.5, X+1.5] Th range above fragment m/z
  10. ions_above_1_2 - Total ions in (X+1.5, X+2.5] Th range above fragment m/z
  11. ions_above_2_3 - Total ions in (X+2.5, X+3.5] Th range above fragment m/z
  12. ions_below_0_1 - Total ions in (X-1.5, X-0.5] Th range below fragment m/z
  13. ions_below_1_2 - Total ions in (X-2.5, X-1.5] Th range below fragment m/z
  14. ions_below_2_3 - Total ions in (X-3.5, X-2.5] Th range below fragment m/z
  15. adjacent_ratio_0_1 - ions_above_0_1 / fragment_ions (relative adjacent density)
  16. adjacent_ratio_1_2 - ions_above_1_2 / fragment_ions
  17. adjacent_ratio_2_3 - ions_above_2_3 / fragment_ions
  18. adjacent_ratio_below_0_1 - ions_below_0_1 / fragment_ions
  19. adjacent_ratio_below_1_2 - ions_below_1_2 / fragment_ions
  20. adjacent_ratio_below_2_3 - ions_below_2_3 / fragment_ions
  21. rfa2_temp - RF amplifier temperature (°C)
  22. rfc2_temp - RF electronics temperature (°C)

Note: Features 6-20 are only included if injection time data is available in the mzML files. Features 21-22 are only included if temperature CSV files are provided. Features with universally missing data are automatically excluded.

RF Temperature Data

Mars can incorporate RF temperature data to model thermal effects on mass accuracy. Temperature data is loaded from CSV files exported from Thermo chromatogram exports.

Temperature File Format

Temperature CSV files should be in Thermo's chromatogram export format:

  • 3 header lines (skipped)
  • Columns: Time(min), temperature value

Example naming convention:

RFA2-Sample_Name.csv  # RF amplifier temperature
RFC2-Sample_Name.csv  # RF electronics temperature  

Usage with Temperature Data

mars calibrate \
  --mzml data.mzML \
  --prism-csv report.csv \
  --temperature-dir /path/to/temperature_csvs/ \
  --output-dir output/

Mars automatically finds temperature files matching each mzML filename and interpolates temperature values at each spectrum's retention time.

Python API

from mars import load_blib, read_dia_spectra, match_library_to_spectra, MzCalibrator

# Load library and match
library = load_blib("library.blib")
spectra = read_dia_spectra("data.mzML")
matches = match_library_to_spectra(library, spectra, mz_tolerance=0.2, min_intensity=1500)

# Train and save model
calibrator = MzCalibrator()
calibrator.fit(matches)
calibrator.save("model.pkl")

Requirements

  • Spectral library: blib format from Skyline with fragment annotations
  • mzML files: DIA data from Thermo Stellar (or similar unit resolution instrument)
  • PRISM CSV (optional): Skyline report with Start Time, End Time, Replicate Name columns

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mars_ms-0.1.2.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mars_ms-0.1.2-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file mars_ms-0.1.2.tar.gz.

File metadata

  • Download URL: mars_ms-0.1.2.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mars_ms-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7638435abecda372f3b0327b4cf269911028eb39d4c95ce94e0513d2b512a1e9
MD5 d5b05f705a935bd82ae8d1accad4acb1
BLAKE2b-256 c62400adf849da340f194b01fad1ff2ba8ab3a02c5f8f9e48728c9f249275769

See more details on using hashes here.

Provenance

The following attestation bundles were made for mars_ms-0.1.2.tar.gz:

Publisher: publish.yml on maccoss/mars

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mars_ms-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mars_ms-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mars_ms-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29e65fba239ed0ed4047a51215f0181ab91325808299dde0fc3130290a9e4a0f
MD5 e83168e69e8464510692b29801647027
BLAKE2b-256 ce3a9e20f0cf5307239218afa4434e6a7af6693b0a8993130a4e320ab9b59c51

See more details on using hashes here.

Provenance

The following attestation bundles were made for mars_ms-0.1.2-py3-none-any.whl:

Publisher: publish.yml on maccoss/mars

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page