The maldi_tof_classifier package offers a CLI and Python 3 API for machine learning based classification of MALDI TOF spectra as measured by a Shimadzu 8300 MALDI-TOF mass spectrometer.
Project description
maldi-tof-classifier
Version: 0.3.0
The maldi-tof-classifier package provides functionality for:
- Reading MALDI-TOF spectra
- Preprocessing spectral data
- Machine learning based classification
It is designed for spectra generated by a Shimadzu 8030 MALDI-TOF mass spectrometer.
Source code: https://github.com/ofmk94/maldi-tof-classifier
License: MIT
Installation
Python 3.10 or later is required.
Install the package from PyPI:
pip install maldi-tof-classifier
Additionally, the R package MALDIquant must be installed on the system. It can usually be installed from within R with:
install.packages("MALDIquant")
Overview
This README consists of two parts:
- CLI tool usage
- Python API and typical workflows
The tool is a Python package, but foremost a CLI tool.
It is strongly recommended to work with peak data and the default PeakExtractor.
Example data for download is available under: https://github.com/ofmk94/maldi-tof-classifier-data
Part 1 - CLI Tool Usage
1.1 Required directory structure
The CLI tool requires the following directory structure:
data_train/
A/
sample1.csv
sample2.csv
B/
sample3.csv
data_predict/
unknown1.csv
unknown2.csv
cli_files/
config.yaml
-
data_trainContains subdirectories for the classes to be learned, for exampleA,B,C. The subdirectories may contain either.txtfiles with spectra or.csvfiles with peak data as produced by a Shimadzu 8030 MALDI-TOF mass spectrometer. This directory is for training only. -
data_predictContains files of the same type, either.txtfull spectra or.csvpeak data, to be classified. This directory is for prediction. -
cli_filesContains the files necessary for the CLI setup and the files with results. It must containconfig.yaml.
Training and prediction must use the same file type and the same extractor.
1.2 Output files
The following files are created inside cli_files during usage:
cli_files/pipeline.joblib
created once the model with the classification pipeline is trained.
cli_files/training_performance.csv
test set performance of the pipeline on classification
includes accuracy, precision, recall, f1-score, confusion matrix.
cli_files/predictions.csv
predictions on data_predict data.
1.3 CLI commands
There are two commands available:
Train the model:
mtc train
Predict on new data:
mtc predict
Both commands need to be executed in an environment with the directories described above.
1.4 Configuration via config.yaml
The setup for the training can be thoroughly defined through cli_files/config.yaml.
All parameters are optional. There are default values for everything, so providing settings is optional.
1.5 Extractor settings
1.5.1 extractor_cls
Type of extractor.
Options:
"PeakExtractor"for working with.csvfiles containing peak data"FullSpectraExtractor"for full spectra.txtfiles
Default:
extractor_cls: "PeakExtractor"
This must be coherent between mtc train and mtc predict.
1.5.2 extractor_params
Additional optional parameters for PeakExtractor or FullSpectraExtractor.
Default:
extractor_params: null
These parameters are passed directly to the selected extractor constructor.
For PeakExtractor, the main parameters are:
snr_threshdefault3.0rel_shift_tolerancedefault0.002min_peak_freqdefault0.25
Example:
extractor_params:
snr_thresh: 3.0
rel_shift_tolerance: 0.002
For FullSpectraExtractor, the main parameters are:
use_mz_cutoffdefaultfalsemz_cutoff_massdefault20000.0
Example:
extractor_params:
use_mz_cutoff: true
mz_cutoff_mass: 20000.0
The dataclasses for file location and file parsing are advanced options and should generally not be set by the CLI user.
1.6 Scaling and dimensionality reduction
1.6.1 scaler_cls
Scaling object.
Available options from sklearn.preprocessing:
"StandardScaler""MinMaxScaler"
Optional.
Default:
scaler_cls: null
1.6.2 dim_reducer_cls
Optional dimensionality reduction.
Options:
"PCA"fromsklearn.decomposition"SVD"usingsklearn.decomposition.TruncatedSVD
Default:
dim_reducer_cls: null
1.6.3 n_components
Number of components to use in optional dimensionality reduction.
Type:
int
Default:
n_components: 20
1.7 Classifier settings
1.7.1 classifier_cls
Classification model.
Available options:
- sklearn models:
LogisticRegression,LinearDiscriminantAnalysis,QuadraticDiscriminantAnalysis,PLSRegressionasPLS-DA,SVC,RandomForestClassifier - xgboost model:
XGBClassifier - special option:
OPLS-DA, implemented viaLogisticRegression
Default:
classifier_cls: "RandomForestClassifier"
1.7.2 classifier_params
Optional parameters for the classifier.
Default:
classifier_params: null
These parameters are passed directly to the selected classifier constructor. Only parameters valid for the selected classifier should be used here.
Examples:
classifier_params:
n_estimators: 200
max_depth: 10
or:
classifier_params:
C: 1.0
max_iter: 1000
Refer to the documentation of scikit-learn or xgboost for the full list of supported constructor arguments.
1.8 Train/test split and balancing
1.8.1 test_size
Size of test set.
Type:
float
Default:
test_size: 0.2
1.8.2 oversample
Whether simple oversampling should be performed for balancing training classes.
Type:
bool
Default:
oversample: true
1.9 Notes
- It is strongly recommended to work with peak data and the default
PeakExtractor. - Training and prediction must use the same extractor and the same data format.
- The advanced file parsing options are usually not needed for standard CLI usage.
Part 2 - Python API and typical workflows
2.1 Overview
The maldi-tof-classifier package provides functionality for:
- Reading MALDI-TOF spectra
- Preprocessing spectral data
- Machine learning based classification
Source code: https://github.com/ofmk94/maldi-tof-classifier
Example data: https://github.com/ofmk94/maldi-tof-classifier-data
Docstrings contain more detailed information on parameters and behavior. This section illustrates typical usage.
The directory structure is the same as in Part 1.
2.2 Step 1 - Loading and preprocessing data
Recommended: peak data using PeakExtractor
from maldi_tof_classifier.extractors import PeakExtractor
from pathlib import Path
from sklearn.model_selection import train_test_split
TRAIN_DIR = Path(".") / "data_train"
extractor = PeakExtractor(snr_thresh=3.0)
peaks_dfs, class_labels = extractor.extract_train_data(TRAIN_DIR)
X_train, X_test, y_train, y_test = train_test_split(
peaks_dfs, class_labels, test_size=config["test_size"]
)
X_train = extractor.transform_train_data(X_train)
X_test = extractor.transform_predict_data(X_test)
Alternative: full spectra using FullSpectraExtractor
from maldi_tof_classifier.extractors import FullSpectraExtractor
extractor = FullSpectraExtractor(use_mz_cutoff=True, mz_cutoff_mass=20000.0)
spectra, class_labels, spots = extractor.extract_train_data(TRAIN_DIR)
X_train, X_test, y_train, y_test = train_test_split(
spectra, class_labels, test_size=config["test_size"]
)
2.3 Step 2 - Label encoding
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.transform(y_test)
2.4 Step 3 - Handle class imbalance (optional)
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler()
X_train, y_train = ros.fit_resample(X_train, y_train)
2.5 Step 4 - Scaling (optional)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
2.6 Step 5 - Dimensionality reduction (optional)
from sklearn.decomposition import PCA
pca = PCA(n_components=20)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
2.7 Step 6 - Classification
RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
XGBClassifier
from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
2.8 Neural network models
Available in maldi_tof_classifier.nn:
CNN1DClassifierLSTMClassifier
For neural networks, a train/validation/test split and one-hot encoding is typically used.
Split
from sklearn.model_selection import train_test_split
X_train, X_val_test, y_train, y_val_test = train_test_split(
spectra, class_labels, test_size=0.3
)
X_val, X_test, y_val, y_test = train_test_split(
X_val_test, y_val_test, test_size=0.333
)
One-hot encoding
from tensorflow.keras.utils import to_categorical
n_classes = y_train.max() + 1
y_train = to_categorical(y_train, n_classes)
y_val = to_categorical(y_val, n_classes)
y_test = to_categorical(y_test, n_classes)
Example
from maldi_tof_classifier.nn import CNN1DClassifier
model = CNN1DClassifier(X_train, y_train)
model.fit(
X_train,
y_train,
epochs=20,
validation_data=(X_val, y_val)
)
y_pred = model.predict(X_test)
2.9 Pipeline API
Steps 2.5–2.7 (Step 4–6) can be combined into a pipeline:
from maldi_tof_classifier.pipelines import generate_pipeline
Components
- Scaler (optional)
- Dimensionality Reduction (optional)
- Classifier (required)
Parameters
-
classifier_clsInstantiable class of the classifier. -
classifier_paramsParameters passed to the classifier. -
scaler_clsOptional scaler class. -
dim_reducer_clsOptional dimensionality reduction class. -
n_componentsNumber of components for dimensionality reduction.
Example
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from maldi_tof_classifier.core import generate_pipeline
pipeline = generate_pipeline(
classifier_cls=RandomForestClassifier,
classifier_params={"n_estimators": 100},
scaler_cls=StandardScaler,
dim_reducer_cls=PCA,
n_components=20
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
Author
Oliver Klein
oliver.klein@stud.hcw.ac.at
oliverfmklein@gmail.com
License
This project is licensed under the MIT License.
Copyright (c) 2026 Oliver Felix Matthias Klein (GitHub username: ofmk94)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Disclaimer
This README was written based on the original draft and revised into English Markdown format with assistance from ChatGPT (Version 5.3).
No liability is assumed for the provided software or for the contents of this README.
Last edited: April 16th, 2026
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maldi_tof_classifier-0.3.0.tar.gz.
File metadata
- Download URL: maldi_tof_classifier-0.3.0.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78449e88e05893daf456c6b631f095a6a1153ada52ac9af6a3d784e38f33b10d
|
|
| MD5 |
a745a0bd0b12dfe3bca7ca00862da2ab
|
|
| BLAKE2b-256 |
02ea8cad455e6d54da1d60a9613f58f5be9be1025a705fa758818b87d71dbb3e
|
File details
Details for the file maldi_tof_classifier-0.3.0-py3-none-any.whl.
File metadata
- Download URL: maldi_tof_classifier-0.3.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44643a78d46ce2f699ef64d2d4028ac771915d8090b5837395f48184ead5a85f
|
|
| MD5 |
0d26582dea27b98d721f0ea9e34b8e21
|
|
| BLAKE2b-256 |
0fc9c925ccf8333fb51c33466e4d1d0b6bbbda48c67aa4f5e48ac610e3ae422d
|