A comprehensive toolkit for MALDI-TOF mass spectrometry data preprocessing for antimicrobial resistance (AMR) prediction purposes
Project description
MaldiAMRKit
A comprehensive toolkit for MALDI-TOF mass spectrometry data preprocessing for antimicrobial resistance (AMR) prediction purposes
Installation • Features • Quick Start • Documentation • Tutorials • Contributing • Citing • License
Installation
pip install maldiamrkit
Optional: Batch Correction & UMAP
pip install maldiamrkit[batch]
Installs combatlearn for ComBat-based batch effect correction and umap-learn for UMAP exploratory plots.
Development Installation
git clone https://github.com/EttoreRocchi/MaldiAMRKit.git
cd MaldiAMRKit
pip install -e .[dev]
Features
Preprocessing
- Composable Pipeline: Build custom
PreprocessingPipelinefrom individual transformers (smoothing, baseline correction, normalization, trimming), serializable to JSON/YAML - Multiple Binning Strategies: Uniform, proportional, adaptive, and custom bin edges
- Quality Metrics: SNR estimation, comprehensive quality reports, and alignment assessment
- Replicate Merging: Mean/median/weighted merging with correlation-based outlier detection
Alignment & Detection
- Spectral Alignment: Shift, linear, piecewise, and DTW warping for both binned and raw full-resolution spectra
- Peak Detection: Local maxima and persistent homology methods
Evaluation
- AMR Metrics: VME, ME, sensitivity, specificity, categorical agreement, and
amr_classification_reportfollowing EUCAST/CLSI conventions - Label Encoding:
LabelEncoderfor mapping R/I/S to binary with configurable intermediate handling - Stratified Splitting: Species-drug stratified and case-based (patient-grouped) splitting to prevent data leakage
Differential Analysis
DifferentialAnalysis: Per-bin statistical testing (Mann-Whitney U, Welch's t-test) between resistant and susceptible groups, with multiple-testing correction, log2 fold change, and Cohen's d effect size- Peak Selection:
top_peaks()by adjusted p-value,significant_peaks()with fold-change and p-value thresholds,compare_drugs()for multi-drug boolean significance matrices - AMR-Aware Plots:
plot_volcano(),plot_manhattan()along the m/z axis, andplot_drug_comparison()with binary heatmap or UpSet-style intersection view
Drift Monitoring
DriftMonitor: Anchor a baseline on early timestamps (default: first 20%) and track temporal drift via three complementary views - reference similarity of per-window median spectra, PCA centroid trajectory in a baseline-fitted PCA space, and Jaccard stability of top-k differential peaks over time- Trajectory Plots:
plot_reference_drift,plot_pca_drift,plot_peak_stability,plot_effect_size_drift
Data Management
- Dataset Building & Loading:
DatasetBuilderandDatasetLoaderwith pluggable layout adapters (FlatLayout,BrukerTreeLayout,DRIAMSLayout,MARISMaLayout) - Bruker Format Support: Read Bruker flexAnalysis binary data (fid/1r + acqus) natively via
read_spectrum()on directories - MIC Parsing:
parse_mic_column()for parsing MIC strings with qualifiers and European decimals - Composable Filters:
SpeciesFilter,DrugFilter,QualityFilter,MetadataFiltercombinable with&/|/~operators - Spectrum Export: Save spectra to CSV or TXT via
MaldiSpectrum.save()andMaldiSet.save_spectra()
Visualization & Tools
- Exploratory Plots: PCA, t-SNE, and UMAP scatter plots colored by species, resistance phenotype, or any metadata column
- Batch Effect Correction: Multi-site/multi-instrument correction via
combatlearn(pip install maldiamrkit[batch]) - CLI:
maldiamrkit preprocess,maldiamrkit quality, andmaldiamrkit buildfor batch processing - Parallel Processing: Multi-core support via
n_jobsparameter - ML-Ready: Direct integration with scikit-learn pipelines
Documentation
Full documentation is available at maldiamrkit.readthedocs.io.
Quick Start
Load and Preprocess a Single Spectrum
from maldiamrkit import MaldiSpectrum
# Load spectrum from file
spec = MaldiSpectrum("data/spectrum.txt")
# Preprocess: smoothing, baseline removal, normalization
spec.preprocess()
# Optional: bin to reduce dimensions
spec.bin(bin_width=3) # 3 Da bins
# Visualize
from maldiamrkit.visualization import plot_spectrum
plot_spectrum(spec, binned=True)
Build a Dataset from Multiple Spectra
from maldiamrkit import MaldiSet
# Load multiple spectra with metadata
data = MaldiSet.from_directory(
spectra_dir="data/spectra/",
meta_file="data/metadata.csv",
aggregate_by=dict(antibiotics="Drug", species="Escherichia coli"),
bin_width=3
)
# Access features and labels
X = data.X # Feature matrix
y = data.get_y_single("Drug") # Target labels
Machine Learning Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from maldiamrkit.alignment import Warping
from maldiamrkit.detection import MaldiPeakDetector
# Create ML pipeline
pipe = Pipeline([
("peaks", MaldiPeakDetector(binary=False, prominence=0.05)),
("warp", Warping(method="shift")),
("scaler", StandardScaler()),
("clf", RandomForestClassifier(n_estimators=100, random_state=42))
])
# Cross-validation
scores = cross_val_score(pipe, X, y, cv=5, scoring="accuracy")
print(f"CV Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}")
For more examples covering alignment, filtering, evaluation, CLI usage, and more, see the Quickstart Guide and API Reference.
Tutorials
For more detailed examples, see the notebooks:
- Quick Start - Loading, preprocessing, binning, and quality assessment
- Peak Detection - Local maxima and persistent homology methods
- Alignment - Warping methods and alignment quality
- Evaluation - AMR metrics, label encoding, and stratified splitting
- Exploration - PCA, t-SNE, UMAP visualizations and batch correction
- Differential Analysis - R vs. S peak testing, volcano/Manhattan plots, and multi-drug comparison
- Drift Monitoring - Baseline-anchored drift detection: reference similarity, PCA trajectory, peak stability, and effect-size drift
Contributing
Pull requests, bug reports, and feature ideas are welcome. See the Contributing Guide for how to get started.
Citing
If you use MaldiAMRKit in your research, please cite:
Rocchi, E., Nicitra, E., Calvo, M. et al. Combining mass spectrometry and machine learning models for predicting Klebsiella pneumoniae antimicrobial resistance: a multicenter experience from clinical isolates in Italy. BMC Microbiol (2026). doi:10.1186/s12866-025-04657-2
See the full publications list for more papers using MaldiAMRKit.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgements
This toolkit is inspired by:
Weis, C., Cuénod, A., Rieck, B., et al. (2022). Direct antimicrobial resistance prediction from clinical MALDI-TOF mass spectra using machine learning. Nature Medicine, 28, 164-174. https://doi.org/10.1038/s41591-021-01619-9
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maldiamrkit-0.13.0.tar.gz.
File metadata
- Download URL: maldiamrkit-0.13.0.tar.gz
- Upload date:
- Size: 107.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc17d018cbc691854adc50d87ee29d898b21ff82d8fd84ff3e22465a47cf6215
|
|
| MD5 |
70610940defa5a5aa628e5fb55a4ee4f
|
|
| BLAKE2b-256 |
22e48c57fcbb2d85900ce2e059a0e7cb833f64acc646b2ba57276676acc74fc3
|
File details
Details for the file maldiamrkit-0.13.0-py3-none-any.whl.
File metadata
- Download URL: maldiamrkit-0.13.0-py3-none-any.whl
- Upload date:
- Size: 126.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5f9fa633bf3d61091fb31d66e444c7804fe6730c15be10128887812a843de30
|
|
| MD5 |
ae3888beb5aed8204b2800fef0c62196
|
|
| BLAKE2b-256 |
ffdcaf998c8e5ccead07dd3a1cab75fce92104f50305a0a2a24846bbdcc5ee75
|