Mass Accuracy Recalibration System
Project description
MARS: Mass Accuracy Recalibration System
Mass recalibration tool for DIA mass spectrometry data from the ThermoFisher Stellar.
Overview
Mars learns m/z calibration corrections from spectral library fragment matches. The XGBoost model accounts for:
- Fragment m/z: Mass-dependent calibration bias
- Peak intensity: Higher intensity peaks provide more reliable calibration
- Absolute time: Calibration drift over the acquisition run
- Spectrum TIC: Space charge effects from high ion current
- Ion injection time: Signal accumulation duration effects
- Precursor m/z: DIA isolation window-specific effects
- RF temperatures: Thermal effects from RF amplifier (RFA2) and electronics (RFC2)
How It Works
-
Fragment matching: For each DIA MS2 spectrum, Mars finds library peptides where:
- The precursor m/z falls within the DIA isolation window
- The spectrum RT is within the peptide's elution window
-
Peak selection: For each expected fragment, Mars selects the most intense peak within the m/z tolerance (not the closest), filtering for minimum intensity
-
Model training: Each matched fragment becomes a training point with up to 16 features (see Model Features) and target:
delta_mz -
Calibration: The trained model predicts m/z corrections for all peaks in the mzML
Installation
From PyPI (recommended)
pip install mars-ms
From source
git clone https://github.com/maccoss/mars.git
cd mars
pip install -e .
Requirements: Python 3.10+, pyteomics, xgboost, numpy, pandas, matplotlib, seaborn, click
Usage
With PRISM CSV (Recommended)
Use a CSV file created using this Skyline report for accurate RT windows:
mars calibrate \
--mzml data.mzML \
--prism-csv prism_report.csv \
--tolerance 0.2 \
--min-intensity 500 \
--max-isolation-window 5.0 \
--output-dir output/
Basic Usage
mars calibrate --mzml data.mzML --library library.blib --output-dir output/
Batch Processing
# Multiple files with wildcard
mars calibrate --mzml "*.mzML" --library library.blib --output-dir output/
# All files in directory
mars calibrate --mzml-dir /path/to/data/ --library library.blib --output-dir output/
Options
| Option | Default | Description |
|---|---|---|
--mzml |
- | Path to mzML file or glob pattern |
--mzml-dir |
- | Directory containing mzML files |
--library |
- | Path to blib spectral library (ignored if using PRISM Skyline Report) |
--prism-csv |
- | PRISM Skyline CSV with Start/End Time columns |
--tolerance |
0.7 | m/z tolerance for matching (Th) |
--min-intensity |
500 | Minimum peak intensity for matching |
--max-isolation-window |
- | Maximum isolation window width (m/z) to include |
--temperature-dir |
- | Directory with RF temperature CSV files |
--output-dir |
. |
Output directory |
--model-path |
- | Path to save/load calibration model |
--no-recalibrate |
- | Only train model, don't write mzML |
RT Window Behavior
- With
--prism-csv: Uses exactStart TimeandEnd Timefrom Skyline - Without
--prism-csv: Uses ±5 seconds around the blib library RT
Isolation Window Filtering
Some DIA methods use wide isolation windows (e.g., 20-30 m/z) that may reduce calibration accuracy. Use --max-isolation-window to exclude these:
# Exclude windows wider than 5 m/z
mars calibrate --mzml data.mzML --prism-csv report.csv --max-isolation-window 5.0
This filters spectra during both model training and mzML recalibration. Typical narrow DIA windows (~1 m/z) are retained.
Output Files
| File | Description |
|---|---|
{input}-mars.mzML |
Recalibrated mzML file |
mars_model.pkl |
Trained XGBoost calibration model |
mars_qc_histogram.png |
Delta m/z distribution (before/after) |
mars_qc_heatmap.png |
2D heatmap (RT × m/z, color = delta) |
mars_qc_intensity_vs_error.png |
Intensity vs mass error hexbin |
mars_qc_rt_vs_error.png |
RT vs mass error hexbin |
mars_qc_mz_vs_error.png |
Fragment m/z vs mass error hexbin |
mars_qc_tic_vs_error.png |
TIC vs mass error hexbin |
mars_qc_injection_time_vs_error.png |
Injection time vs mass error hexbin |
mars_qc_tic_injection_time_vs_error.png |
TIC×injection time vs mass error hexbin |
mars_qc_fragment_ions_vs_error.png |
Fragment ions vs mass error hexbin |
mars_qc_rfa2_temperature_vs_error.png |
RFA2 temperature vs error (if available) |
mars_qc_rfc2_temperature_vs_error.png |
RFC2 temperature vs error (if available) |
mars_qc_feature_importance.png |
Model feature importance |
mars_qc_summary.txt |
Calibration statistics |
Model Features
The XGBoost model uses up to 16 features to predict m/z corrections:
precursor_mz- DIA isolation window centerfragment_mz- Fragment m/z being calibratedabsolute_time- Time relative to first acquisition (seconds)log_tic- Log10 of spectrum total ion currentlog_intensity- Log10 of peak intensityinjection_time- Ion injection time (seconds)tic_injection_time- TIC × injection time productfragment_ions- Fragment intensity × injection time (total ions, not rate)ions_above_0_1- Total ions in (X, X+1] Th range above fragment m/zions_above_1_2- Total ions in (X+1, X+2] Th range above fragment m/zions_above_2_3- Total ions in (X+2, X+3] Th range above fragment m/zadjacent_ratio_0_1- ions_above_0_1 / fragment_ions (relative adjacent density)adjacent_ratio_1_2- ions_above_1_2 / fragment_ionsadjacent_ratio_2_3- ions_above_2_3 / fragment_ionsrfa2_temp- RF amplifier temperature (°C)rfc2_temp- RF electronics temperature (°C)
Note: Features 6-14 are only included if injection time data is available in the mzML files. Features 15-16 are only included if temperature CSV files are provided. Features with universally missing data are automatically excluded.
RF Temperature Data
Mars can incorporate RF temperature data to model thermal effects on mass accuracy. Temperature data is loaded from CSV files exported from Thermo chromatogram exports.
Temperature File Format
Temperature CSV files should be in Thermo's chromatogram export format:
- 3 header lines (skipped)
- Columns:
Time(min), temperature value
Example naming convention:
RFA2-Sample_Name.csv # RF amplifier temperature
RFC2-Sample_Name.csv # RF electronics temperature
Usage with Temperature Data
mars calibrate \
--mzml data.mzML \
--prism-csv report.csv \
--temperature-dir /path/to/temperature_csvs/ \
--output-dir output/
Mars automatically finds temperature files matching each mzML filename and interpolates temperature values at each spectrum's retention time.
Python API
from mars import load_blib, read_dia_spectra, match_library_to_spectra, MzCalibrator
# Load library and match
library = load_blib("library.blib")
spectra = read_dia_spectra("data.mzML")
matches = match_library_to_spectra(library, spectra, mz_tolerance=0.2, min_intensity=1500)
# Train and save model
calibrator = MzCalibrator()
calibrator.fit(matches)
calibrator.save("model.pkl")
Requirements
- Spectral library: blib format from Skyline with fragment annotations
- mzML files: DIA data from Thermo Stellar (or similar unit resolution instrument)
- PRISM CSV (optional): Skyline report with
Start Time,End Time,Replicate Namecolumns
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mars_ms-0.1.1.tar.gz.
File metadata
- Download URL: mars_ms-0.1.1.tar.gz
- Upload date:
- Size: 46.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54856c46a0adfc46979167480c84a5e7d8d45270f92d098f249b95ad53690324
|
|
| MD5 |
99499c1e1520c245215d91f0c4c36784
|
|
| BLAKE2b-256 |
5d7d3f0ae4323df1a233bbe672e391281128db88dfd25d497145c55a9fa5a891
|
Provenance
The following attestation bundles were made for mars_ms-0.1.1.tar.gz:
Publisher:
publish.yml on maccoss/mars
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mars_ms-0.1.1.tar.gz -
Subject digest:
54856c46a0adfc46979167480c84a5e7d8d45270f92d098f249b95ad53690324 - Sigstore transparency entry: 815779811
- Sigstore integration time:
-
Permalink:
maccoss/mars@d51f41f2e4571d4ae5cded4166c1c155abc9bef9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/maccoss
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d51f41f2e4571d4ae5cded4166c1c155abc9bef9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mars_ms-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mars_ms-0.1.1-py3-none-any.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37b6fcef7a8151d6927aae9887b0ebd9a21e933308fa3a0b15c86db11532824c
|
|
| MD5 |
4d6eec6c15b1ef5087df17247948f4e5
|
|
| BLAKE2b-256 |
9cdbb8e7eda03bb061ad9d80445aed1e2c905099081a1ae494e38f2a36efc704
|
Provenance
The following attestation bundles were made for mars_ms-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on maccoss/mars
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mars_ms-0.1.1-py3-none-any.whl -
Subject digest:
37b6fcef7a8151d6927aae9887b0ebd9a21e933308fa3a0b15c86db11532824c - Sigstore transparency entry: 815779814
- Sigstore integration time:
-
Permalink:
maccoss/mars@d51f41f2e4571d4ae5cded4166c1c155abc9bef9 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/maccoss
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d51f41f2e4571d4ae5cded4166c1c155abc9bef9 -
Trigger Event:
release
-
Statement type: