Skip to main content

A Python package for finding molecular formula candidates from a mass and error window

Project description

find-mfs: Accurate mass ➜ Molecular Formulae

CI PyPI version Python 3.10+ License: GPL v3

find-mfs is a simple Python package for finding molecular formulae candidates which fit some given mass (+/- an error window). It implements Böcker & Lipták's algorithm for efficient formula finding, as implemented in SIRIUS.

find-mfs also implements other methods for filtering the MF candidate lists:

  • Octet rule
  • Ring/double bond equivalents (RDBE's)
  • Predicted isotope envelopes, generated using Łącki and Startek's algorithm as implemented in IsoSpecPy

Motivation:

I needed to perform mass decomposition and, shockingly, I could not find a Python library for it (despite being a routine process). find-mfs is intended to be used by anyone looking to incorporate molecular formula finding into their Python project.

Installation

pip install find-mfs

Example Usage:

Simple queries

# For simple queries, one can use this convenience function
from find_mfs import find_chnops

find_chnops(
    mass=613.2391,         # Novobiocin [M+H]+ ion; C31H37N2O11+
    charge=1,              # Charge should be specified - electron mass matters
    error_ppm=5.0,         # Can also specify error_da instead
                           # --- FORMULA FILTERS ----
    check_octet=True,      # Candidates must obey the octet rule
    filter_rdbe=(0, 20),   # Candidates must have 0 to 20 ring/double-bond equivalents
    max_counts='C*H*N*O*P0S2'      # Element constraints: unlimited C/H/N/O,
                                   # No phosphorous atoms, up to two sulfurs.
)

Output:

FormulaSearchResults(query_mass=613.2391, n_results=38)

Formula                   Error (ppm)     Error (Da)      RDBE
----------------------------------------------------------------------
[C6H25N30O4S]+                     -0.12       0.000073       9.5
[C31H37N2O11]+                      0.14       0.000086      14.5
[C14H29N24OS2]+                     0.18       0.000110      12.5
[C16H41N10O11S2]+                   0.20       0.000121       1.5
[C29H33N12S2]+                     -0.64       0.000392      19.5
... and 33 more

Batch Queries

# If processing many masses, it's better to instantiate a FormulaFinder object
from find_mfs import FormulaFinder

finder = FormulaFinder()
finder.find_formulae(
    mass=613.2391,         # Novobiocin [M+H]+ ion; C31H37N2O11+
    charge=1,              
    error_ppm=5.0,         
    # ... etc
)

Including Isotope Envelope Information

If an isotope envelope is available, the candidate list can be dramatically reduced.

import numpy as np

# STEP 1: Retrieve isotope envelope from experimental data
observed_envelope = np.array(
    [  #  m/z    , relative intsy.
        [613.2397,    1.00],
        [614.2429,    0.35],
        [615.2456,    0.10],
    ]
)

# STEP 2: define isotope matching parameters
from find_mfs import SingleEnvelopeMatch
iso_config = SingleEnvelopeMatch(
    envelope=observed_envelope,     # np.ndarray with an m/z column and an intensity column
    mz_tolerance_da=0.005,          # Tolerance for aligning isotope signals. Should be very generous. Can also use mz_tolerance_ppm
    minimum_rmse=0.05,              # Default is 0.05, i.e. instrument reproduces isotope envelope w/ 5% fidelity
)

# STEP 3: include isotope matching parameters when performing a search
from find_mfs import FormulaFinder
finder = FormulaFinder()
finder.find_formulae(
    mass=613.2391,         # Novobiocin [M+H]+ ion; C31H37N2O11+
    charge=1,              # Charge should be specified - electron mass matters
    error_ppm=3.0,         # Can also specify error_da instead
                           # --- FORMULA FILTERS ----
    check_octet=True,      # Candidates must obey the octet rule
    filter_rdbe=(0, 20),   # Candidates must have 0 to 20 ring/double-bond equivalents
    max_counts={
        'P': 0,            # Candidates must not have any phosophorous atoms
        'S': 2,            # Candidates can have up to two sulfur atoms
    },
    isotope_match=iso_config,
)

Output:

FormulaSearchResults(query_mass=613.2391, n_results=5)

Formula                   Error (ppm)     Error (Da)      RDBE       Iso. Matches   Iso. RMSE 
------------------------------------------------------------------------------------------------------
[C31H37N2O11]+                      0.14       0.000086      14.5           3/3    0.0121
[C23H41N4O13S]+                    -0.92       0.000565       5.5           3/3    0.0478
[C24H37N8O9S]+                      1.26       0.000772      10.5           3/3    0.0311
[C32H33N6O7]+                       2.32       0.001424      19.5           3/3    0.0230
[C25H33N12O5S]+                     3.44       0.002110      15.5           3/3    0.0146

Jupyter Notebook:

See this Jupyter notebook for more thorough examples/demonstrations


If you use this package, make sure to cite:

Contributing

Contributions are welcome. Here's a list of features I feel should be implemented eventually. The bold items are what I'm currently working on.

  • Statistics-based isotope envelope fitting
  • Fragmentation constraints
  • Bayesian formula candidate ranking
  • Element ratio constraints
  • GUI app

License

This project is distributed under the GPL-3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

find_mfs-0.3.0.tar.gz (232.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

find_mfs-0.3.0-py3-none-any.whl (228.0 kB view details)

Uploaded Python 3

File details

Details for the file find_mfs-0.3.0.tar.gz.

File metadata

  • Download URL: find_mfs-0.3.0.tar.gz
  • Upload date:
  • Size: 232.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for find_mfs-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3f1a160206dc3525fb9b96785a783d18462c183a0f6b3e7bdf4a3f0f266624b7
MD5 94516ce102b6483e4c861e2a97f5563d
BLAKE2b-256 c8fd3933fe3b79268979c481896dab598e16c4d5296d8199a02468bd5644aa03

See more details on using hashes here.

File details

Details for the file find_mfs-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: find_mfs-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 228.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for find_mfs-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb893ef99bf559b0b020940e4626436aff8f521def35555a12b59a8a96a22c71
MD5 bfacb709daf007a384e12d8e5e50e93d
BLAKE2b-256 aa9a9bf6533322e96d2c5e8aaff63bf3be779359b464c60ffcb3b92c290daa5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page