Skip to main content

A package for predicting chemical formulas from tandem mass spectra

Project description

msfiddle

License PyPI Documentation

msfiddle is the PyPI package for FIDDLE, a deep learning method for chemical formula prediction from tandem mass spectra (MS/MS).

Highlights

  • Predict molecular formulas from MS/MS spectra with pre-trained FIDDLE models.
  • Use the package from the command line, from native Python arrays, or from MGF files.
  • Reuse loaded models for efficient batched prediction in Python applications.
  • Incorporate BUDDY and SIRIUS candidate outputs in file-based workflows.

Paper: https://www.nature.com/articles/s41467-025-66060-9

Documentation: https://msfiddle.readthedocs.io

For the full experimental codebase, see https://github.com/JosieHong/FIDDLE.

Installation

pip install msfiddle

PyTorch is required for inference. Install the optional inference extra, or install PyTorch separately for your platform:

pip install "msfiddle[inference]"

See the official PyTorch installation guide for custom CUDA builds: https://pytorch.org/get-started/locally/.

Usage

Command-line interface

Download the pre-trained checkpoints before running predictions:

# Download models to the default location (~/.msfiddle/check_point)
msfiddle-download-models

# Or specify a custom location and models
msfiddle-download-models --destination /path/to/models \
                          --models fiddle_tcn_qtof fiddle_rescore_qtof

msfiddle 2.0.1 reuses the FIDDLE v2.0.0 checkpoint assets.

Run the packaged demo:

msfiddle --demo --result_path ./output_demo.csv --device 0

Run the demo on CPU:

msfiddle --demo --result_path ./output_demo.csv --device 0 --no_cuda

Run prediction on your own MGF file:

msfiddle --test_data /path/to/data.mgf \
         --instrument_type orbitrap \
         --result_path /path/to/results.csv \
         --device 0

--instrument_type accepts orbitrap (default) or qtof. If checkpoints are missing, the CLI exits with instructions to run msfiddle-download-models.

Python API

Use predict_from_spectrum for one-off prediction from native MS/MS arrays:

from msfiddle import predict_from_spectrum

candidates = predict_from_spectrum(
    mz_array=[60.0, 85.0, 100.0, 125.0, 150.0],
    intensity_array=[10.0, 50.0, 20.0, 35.0, 15.0],
    precursor_mz=180.063,
    adduct="[M+H]+",
    top_k=5,
    instrument_type="orbitrap",
    collision_energy="Unknown",
    device="cpu",
)

For repeated or batched prediction, reuse MsFiddlePredictor so checkpoints are loaded once:

from msfiddle import MsFiddlePredictor

predictor = MsFiddlePredictor(instrument_type="orbitrap", device="cpu")

results = predictor.predict_batch(
    [
        {
            "id": "sample-1",
            "mz_array": [60.0, 85.0, 100.0, 125.0, 150.0],
            "intensity_array": [10.0, 50.0, 20.0, 35.0, 15.0],
            "precursor_mz": 180.063,
            "adduct": "[M+H]+",
            "collision_energy": "Unknown",
        }
    ]
)

Python APIs do not download model checkpoints unless download_models=True is passed.

Input and output formats

CSV output

The CLI writes a CSV file with one row per spectrum. Key columns include:

Column Description
ID Spectrum title from the MGF file.
Mass Neutral mass calculated from precursor m/z and adduct.
Pred Formula Initial formula predicted by the neural model.
Pred Mass Model-predicted mass.
Pred Atom Num Model-predicted atom count.
Pred H/C Num Model-predicted H/C count.
Refined Formula (0..4) Ranked refined formula candidates for the default top-5 output.
Refined Mass (0..4) Masses for the default top-5 refined candidates.
Rescore (0..4) Confidence scores for the default top-5 refined candidates.

API output

The Python predict_from_spectrum() API returns a list of candidate dictionaries:

[
    {
        "formula": "C8H10O",
        "score": 0.94,
        "mass": 122.073,
        "metadata": {...},
    }
]

predict_batch() returns one record per input spectrum with id, candidates, and metadata.

MGF input

The required MGF fields are TITLE, PRECURSOR_MZ, PRECURSOR_TYPE, and COLLISION_ENERGY:

BEGIN IONS
TITLE=EMBL_MCF_2_0_HRMS_Library000529
PEPMASS=111.02016
CHARGE=1-
PRECURSOR_TYPE=[M-H]-
PRECURSOR_MZ=111.02016
COLLISION_ENERGY=50.0
SMILES=[H]c1c([H])n([H])c(=O)n([H])c1=O
FORMULA=C4H4N2O2
THEORETICAL_PRECURSOR_MZ=111.019453
PPM=6.368253318682487
SIMULATED_PRECURSOR_MZ=111.01946768634916
41.0148 0.329893 
41.9986 89.226766 
55.8055 0.200544 
56.2625 0.194617 
67.0304 0.330612 
68.0258 0.402906 
111.0203 100.0 
112.0515 1.2809 
END IONS

Advanced Usage

Inspect checkpoint paths:

msfiddle-checkpoint-paths

Use custom config and checkpoint paths:

msfiddle --test_data /path/to/data.mgf \
         --config_path /path/to/config.yml \
         --resume_path /path/to/tcn_model.pt \
         --rescore_resume_path /path/to/rescore_model.pt \
         --result_path /path/to/results.csv \
         --device 0

Citation

@article{hong2025fiddle,
  title={FIDDLE: a deep learning method for chemical formulas prediction from tandem mass spectra},
  author={Hong, Yuhui and Li, Sujun and Ye, Yuzhen and Tang, Haixu},
  journal={Nature Communications},
  volume={16},
  number={1},
  pages={11102},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msfiddle-2.0.1.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msfiddle-2.0.1-py3-none-any.whl (64.4 kB view details)

Uploaded Python 3

File details

Details for the file msfiddle-2.0.1.tar.gz.

File metadata

  • Download URL: msfiddle-2.0.1.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for msfiddle-2.0.1.tar.gz
Algorithm Hash digest
SHA256 e25d0ef4481060d4d337b9fce81333fd26830e587ad735965d3239eec67562f1
MD5 3860a35556d412525719b8a8d6b2e902
BLAKE2b-256 1cd0871eb474f346919786439b01751da1db8ee0174170a83befd71f13d4e211

See more details on using hashes here.

File details

Details for the file msfiddle-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: msfiddle-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 64.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for msfiddle-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50bd92576ef42983e0ffdd387498b1e2a1145ad30e23f00a894a2215f5fd76ac
MD5 dcbf093b51e29f50607e952d7e08f356
BLAKE2b-256 38dbe044dea99778e4f7e26436d68b65a86f54dde97800d41a45cbdc041fb039

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page