The maldi_tof_classifier package offers a CLI and Python 3 API for machine learning based classification of MALDI TOF spectra as measured by a Shimadzu 8300 MALDI-TOF mass spectrometer.

These details have not been verified by PyPI

Project description

maldi-tof-classifier

Version: 0.3.0

The maldi-tof-classifier package provides functionality for:

Reading MALDI-TOF spectra
Preprocessing spectral data
Machine learning based classification

It is designed for spectra generated by a Shimadzu 8030 MALDI-TOF mass spectrometer.

Source code: https://github.com/ofmk94/maldi-tof-classifier

License: MIT

Installation

Python 3.10 or later is required.

Install the package from PyPI:

pip install maldi-tof-classifier

Additionally, the R package MALDIquant must be installed on the system. It can usually be installed from within R with:

install.packages("MALDIquant")

Overview

This README consists of two parts:

CLI tool usage
Python API and typical workflows

The tool is a Python package, but foremost a CLI tool.

It is strongly recommended to work with peak data and the default PeakExtractor.

Example data for download is available under: https://github.com/ofmk94/maldi-tof-classifier-data

Part 1 - CLI Tool Usage

1.1 Required directory structure

The CLI tool requires the following directory structure:

data_train/
    A/
        sample1.csv
        sample2.csv
    B/
        sample3.csv

data_predict/
    unknown1.csv
    unknown2.csv

cli_files/
    config.yaml

data_train Contains subdirectories for the classes to be learned, for example A, B, C. The subdirectories may contain either .txt files with spectra or .csv files with peak data as produced by a Shimadzu 8030 MALDI-TOF mass spectrometer. This directory is for training only.
data_predict Contains files of the same type, either .txt full spectra or .csv peak data, to be classified. This directory is for prediction.
cli_files Contains the files necessary for the CLI setup and the files with results. It must contain config.yaml.

Training and prediction must use the same file type and the same extractor.

1.2 Output files

The following files are created inside cli_files during usage:

cli_files/pipeline.joblib
    created once the model with the classification pipeline is trained.

cli_files/training_performance.csv
    test set performance of the pipeline on classification
    includes accuracy, precision, recall, f1-score, confusion matrix.

cli_files/predictions.csv
    predictions on data_predict data.

1.3 CLI commands

There are two commands available:

Train the model:

mtc train

Predict on new data:

mtc predict

Both commands need to be executed in an environment with the directories described above.

1.4 Configuration via config.yaml

The setup for the training can be thoroughly defined through cli_files/config.yaml.

All parameters are optional. There are default values for everything, so providing settings is optional.

1.5 Extractor settings

1.5.1 extractor_cls

Type of extractor.

Options:

"PeakExtractor" for working with .csv files containing peak data
"FullSpectraExtractor" for full spectra .txt files

Default:

extractor_cls: "PeakExtractor"

This must be coherent between mtc train and mtc predict.

1.5.2 extractor_params

Additional optional parameters for PeakExtractor or FullSpectraExtractor.

Default:

extractor_params: null

These parameters are passed directly to the selected extractor constructor.

For PeakExtractor, the main parameters are:

snr_thresh default 3.0
rel_shift_tolerance default 0.002
min_peak_freq default 0.25

Example:

extractor_params:
    snr_thresh: 3.0
    rel_shift_tolerance: 0.002

For FullSpectraExtractor, the main parameters are:

use_mz_cutoff default false
mz_cutoff_mass default 20000.0

Example:

extractor_params:
    use_mz_cutoff: true
    mz_cutoff_mass: 20000.0

The dataclasses for file location and file parsing are advanced options and should generally not be set by the CLI user.

1.6 Scaling and dimensionality reduction

1.6.1 scaler_cls

Scaling object.

Available options from sklearn.preprocessing:

"StandardScaler"
"MinMaxScaler"

Optional.

Default:

scaler_cls: null

1.6.2 dim_reducer_cls

Optional dimensionality reduction.

Options:

"PCA" from sklearn.decomposition
"SVD" using sklearn.decomposition.TruncatedSVD

Default:

dim_reducer_cls: null

1.6.3 n_components

Number of components to use in optional dimensionality reduction.

Type:

int

Default:

n_components: 20

1.7 Classifier settings

1.7.1 classifier_cls

Classification model.

Available options:

sklearn models: LogisticRegression, LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis, PLSRegression as PLS-DA, SVC, RandomForestClassifier
xgboost model: XGBClassifier
special option: OPLS-DA, implemented via LogisticRegression

Default:

classifier_cls: "RandomForestClassifier"

1.7.2 classifier_params

Optional parameters for the classifier.

Default:

classifier_params: null

These parameters are passed directly to the selected classifier constructor. Only parameters valid for the selected classifier should be used here.

Examples:

classifier_params:
    n_estimators: 200
    max_depth: 10

or:

classifier_params:
    C: 1.0
    max_iter: 1000

Refer to the documentation of scikit-learn or xgboost for the full list of supported constructor arguments.

1.8 Train/test split and balancing

1.8.1 test_size

Size of test set.

Type:

float

Default:

test_size: 0.2

1.8.2 oversample

Whether simple oversampling should be performed for balancing training classes.

Type:

bool

Default:

oversample: true

1.9 Notes

It is strongly recommended to work with peak data and the default PeakExtractor.
Training and prediction must use the same extractor and the same data format.
The advanced file parsing options are usually not needed for standard CLI usage.

Part 2 - Python API and typical workflows

2.1 Overview

The maldi-tof-classifier package provides functionality for:

Reading MALDI-TOF spectra
Preprocessing spectral data
Machine learning based classification

Source code: https://github.com/ofmk94/maldi-tof-classifier

Example data: https://github.com/ofmk94/maldi-tof-classifier-data

Docstrings contain more detailed information on parameters and behavior. This section illustrates typical usage.

The directory structure is the same as in Part 1.

2.2 Step 1 - Loading and preprocessing data

Recommended: peak data using PeakExtractor

from maldi_tof_classifier.extractors import PeakExtractor
from pathlib import Path
from sklearn.model_selection import train_test_split

TRAIN_DIR = Path(".") / "data_train"

extractor = PeakExtractor(snr_thresh=3.0)

peaks_dfs, class_labels = extractor.extract_train_data(TRAIN_DIR)

X_train, X_test, y_train, y_test = train_test_split(
    peaks_dfs, class_labels, test_size=config["test_size"]
)

X_train = extractor.transform_train_data(X_train)
X_test = extractor.transform_predict_data(X_test)

Alternative: full spectra using FullSpectraExtractor

from maldi_tof_classifier.extractors import FullSpectraExtractor

extractor = FullSpectraExtractor(use_mz_cutoff=True, mz_cutoff_mass=20000.0)

spectra, class_labels, spots = extractor.extract_train_data(TRAIN_DIR)

X_train, X_test, y_train, y_test = train_test_split(
    spectra, class_labels, test_size=config["test_size"]
)

2.3 Step 2 - Label encoding

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

y_train = le.fit_transform(y_train)
y_test = le.transform(y_test)

2.4 Step 3 - Handle class imbalance (optional)

from imblearn.over_sampling import RandomOverSampler

ros = RandomOverSampler()
X_train, y_train = ros.fit_resample(X_train, y_train)

2.5 Step 4 - Scaling (optional)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

2.6 Step 5 - Dimensionality reduction (optional)

from sklearn.decomposition import PCA

pca = PCA(n_components=20)

X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

2.7 Step 6 - Classification

RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier()

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

XGBClassifier

from xgboost import XGBClassifier

classifier = XGBClassifier()

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

2.8 Neural network models

Available in maldi_tof_classifier.nn:

CNN1DClassifier
LSTMClassifier

For neural networks, a train/validation/test split and one-hot encoding is typically used.

Split

from sklearn.model_selection import train_test_split

X_train, X_val_test, y_train, y_val_test = train_test_split(
    spectra, class_labels, test_size=0.3
)

X_val, X_test, y_val, y_test = train_test_split(
    X_val_test, y_val_test, test_size=0.333
)

One-hot encoding

from tensorflow.keras.utils import to_categorical

n_classes = y_train.max() + 1

y_train = to_categorical(y_train, n_classes)
y_val = to_categorical(y_val, n_classes)
y_test = to_categorical(y_test, n_classes)

Example

from maldi_tof_classifier.nn import CNN1DClassifier

model = CNN1DClassifier(X_train, y_train)

model.fit(
    X_train,
    y_train,
    epochs=20,
    validation_data=(X_val, y_val)
)

y_pred = model.predict(X_test)

2.9 Pipeline API

Steps 2.5–2.7 (Step 4–6) can be combined into a pipeline:

from maldi_tof_classifier.pipelines import generate_pipeline

Components

Scaler (optional)
Dimensionality Reduction (optional)
Classifier (required)

Parameters

classifier_cls Instantiable class of the classifier.
classifier_params Parameters passed to the classifier.
scaler_cls Optional scaler class.
dim_reducer_cls Optional dimensionality reduction class.
n_components Number of components for dimensionality reduction.

Example

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

from maldi_tof_classifier.core import generate_pipeline

pipeline = generate_pipeline(
    classifier_cls=RandomForestClassifier,
    classifier_params={"n_estimators": 100},
    scaler_cls=StandardScaler,
    dim_reducer_cls=PCA,
    n_components=20
)

pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)

Author

Oliver Klein
oliver.klein@stud.hcw.ac.at
oliverfmklein@gmail.com

License

This project is licensed under the MIT License.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Disclaimer

This README was written based on the original draft and revised into English Markdown format with assistance from ChatGPT (Version 5.3).

No liability is assumed for the provided software or for the contents of this README.

Last edited: April 16th, 2026

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.2

Apr 16, 2026

0.3.1

Apr 16, 2026

This version

0.3.0

Apr 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maldi_tof_classifier-0.3.0.tar.gz (20.7 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

maldi_tof_classifier-0.3.0-py3-none-any.whl (21.3 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file maldi_tof_classifier-0.3.0.tar.gz.

File metadata

Download URL: maldi_tof_classifier-0.3.0.tar.gz
Upload date: Apr 16, 2026
Size: 20.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for maldi_tof_classifier-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`78449e88e05893daf456c6b631f095a6a1153ada52ac9af6a3d784e38f33b10d`
MD5	`a745a0bd0b12dfe3bca7ca00862da2ab`
BLAKE2b-256	`02ea8cad455e6d54da1d60a9613f58f5be9be1025a705fa758818b87d71dbb3e`

See more details on using hashes here.

File details

Details for the file maldi_tof_classifier-0.3.0-py3-none-any.whl.

File metadata

Download URL: maldi_tof_classifier-0.3.0-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 21.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for maldi_tof_classifier-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44643a78d46ce2f699ef64d2d4028ac771915d8090b5837395f48184ead5a85f`
MD5	`0d26582dea27b98d721f0ea9e34b8e21`
BLAKE2b-256	`0fc9c925ccf8333fb51c33466e4d1d0b6bbbda48c67aa4f5e48ac610e3ae422d`

See more details on using hashes here.

maldi-tof-classifier 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

maldi-tof-classifier

Installation

Overview

Part 1 - CLI Tool Usage

1.1 Required directory structure

1.2 Output files

1.3 CLI commands

1.4 Configuration via config.yaml

1.5 Extractor settings

1.5.1 extractor_cls

1.5.2 extractor_params

1.6 Scaling and dimensionality reduction

1.6.1 scaler_cls

1.6.2 dim_reducer_cls

1.6.3 n_components

1.7 Classifier settings

1.7.1 classifier_cls

1.7.2 classifier_params

1.8 Train/test split and balancing

1.8.1 test_size

1.8.2 oversample

1.9 Notes

Part 2 - Python API and typical workflows

2.1 Overview

2.2 Step 1 - Loading and preprocessing data

2.3 Step 2 - Label encoding

2.4 Step 3 - Handle class imbalance (optional)

2.5 Step 4 - Scaling (optional)

2.6 Step 5 - Dimensionality reduction (optional)

2.7 Step 6 - Classification

RandomForestClassifier

XGBClassifier

2.8 Neural network models

Split

One-hot encoding

Example

2.9 Pipeline API

Components

Parameters

Example

Author

License

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes