A Python package for finding molecular formula candidates from a mass and error window
Project description
find-mfs: Accurate mass ➜ Molecular Formulae
find-mfs is a simple Python package for finding
molecular formulae candidates which fit some given mass (+/- an error window).
It implements Böcker & Lipták's algorithm for efficient formula finding, as
implemented in SIRIUS.
find-mfs also implements other methods
for filtering the MF candidate lists:
- Octet rule
- Ring/double bond equivalents (RDBE's)
- Predicted isotope envelopes, generated using Łącki and Startek's algorithm
as implemented in
IsoSpecPy
Motivation:
I needed to perform mass decomposition and, shockingly, I could not find a Python library for it
(despite being a routine process). find-mfs is intended to be used by anyone looking to incorporate
molecular formula finding into their Python project.
Installation
pip install find-mfs
Example Usage:
Simple queries
# For simple queries, one can use this convenience function
from find_mfs import find_chnops
find_chnops(
mass=613.2391, # Novobiocin [M+H]+ ion; C31H37N2O11+
charge=1, # Charge should be specified - electron mass matters
error_ppm=5.0, # Can also specify error_da instead
# --- FORMULA FILTERS ----
check_octet=True, # Candidates must obey the octet rule
filter_rdbe=(0, 20), # Candidates must have 0 to 20 ring/double-bond equivalents
max_counts='C*H*N*O*P0S2' # Element constraints: unlimited C/H/N/O,
# No phosphorous atoms, up to two sulfurs.
)
Output:
FormulaSearchResults(query_mass=613.2391, n_results=38)
Formula Error (ppm) Error (Da) RDBE
----------------------------------------------------------------------
[C6H25N30O4S]+ -0.12 0.000073 9.5
[C31H37N2O11]+ 0.14 0.000086 14.5
[C14H29N24OS2]+ 0.18 0.000110 12.5
[C16H41N10O11S2]+ 0.20 0.000121 1.5
[C29H33N12S2]+ -0.64 0.000392 19.5
... and 33 more
Batch Queries
# If processing many masses, it's better to instantiate a FormulaFinder object
from find_mfs import FormulaFinder
finder = FormulaFinder()
finder.find_formulae(
mass=613.2391, # Novobiocin [M+H]+ ion; C31H37N2O11+
charge=1,
error_ppm=5.0,
# ... etc
)
Including Isotope Envelope Information
If an isotope envelope is available, the candidate list can be dramatically reduced.
import numpy as np
# STEP 1: Retrieve isotope envelope from experimental data
observed_envelope = np.array(
[ # m/z , relative intsy.
[613.2397, 1.00],
[614.2429, 0.35],
[615.2456, 0.10],
]
)
# STEP 2: define isotope matching parameters
from find_mfs import SingleEnvelopeMatch
iso_config = SingleEnvelopeMatch(
envelope=observed_envelope, # np.ndarray with an m/z column and an intensity column
mz_tolerance_da=0.005, # Tolerance for aligning isotope signals. Should be very generous. Can also use mz_tolerance_ppm
minimum_rmse=0.05, # Default is 0.05, i.e. instrument reproduces isotope envelope w/ 5% fidelity
)
# STEP 3: include isotope matching parameters when performing a search
from find_mfs import FormulaFinder
finder = FormulaFinder()
finder.find_formulae(
mass=613.2391, # Novobiocin [M+H]+ ion; C31H37N2O11+
charge=1, # Charge should be specified - electron mass matters
error_ppm=3.0, # Can also specify error_da instead
# --- FORMULA FILTERS ----
check_octet=True, # Candidates must obey the octet rule
filter_rdbe=(0, 20), # Candidates must have 0 to 20 ring/double-bond equivalents
max_counts={
'P': 0, # Candidates must not have any phosophorous atoms
'S': 2, # Candidates can have up to two sulfur atoms
},
isotope_match=iso_config,
)
Output:
FormulaSearchResults(query_mass=613.2391, n_results=5)
Formula Error (ppm) Error (Da) RDBE Iso. Matches Iso. RMSE
------------------------------------------------------------------------------------------------------
[C31H37N2O11]+ 0.14 0.000086 14.5 3/3 0.0121
[C23H41N4O13S]+ -0.92 0.000565 5.5 3/3 0.0478
[C24H37N8O9S]+ 1.26 0.000772 10.5 3/3 0.0311
[C32H33N6O7]+ 2.32 0.001424 19.5 3/3 0.0230
[C25H33N12O5S]+ 3.44 0.002110 15.5 3/3 0.0146
Jupyter Notebook:
See this Jupyter notebook for more thorough examples/demonstrations
If you use this package, make sure to cite:
- Böcker & Lipták, 2007 - this package uses their algorithm for formula finding...
- ...as implemented in SIRIUS: Böcker et. al., 2008
- Łącki, Valkenborg & Startek 2020 - this package uses IsoSpecPy to quickly simulate isotope envelopes
- Gohlke, 2025 - this package uses
molmass, which provides very convenient methods for handling chemical formulae
Contributing
Contributions are welcome. Here's a list of features I feel should be implemented eventually. The bold items are what I'm currently working on.
Statistics-based isotope envelope fittingFragmentation constraints- Bayesian formula candidate ranking
- Element ratio constraints
- GUI app
License
This project is distributed under the GPL-3 license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file find_mfs-0.3.0.tar.gz.
File metadata
- Download URL: find_mfs-0.3.0.tar.gz
- Upload date:
- Size: 232.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f1a160206dc3525fb9b96785a783d18462c183a0f6b3e7bdf4a3f0f266624b7
|
|
| MD5 |
94516ce102b6483e4c861e2a97f5563d
|
|
| BLAKE2b-256 |
c8fd3933fe3b79268979c481896dab598e16c4d5296d8199a02468bd5644aa03
|
File details
Details for the file find_mfs-0.3.0-py3-none-any.whl.
File metadata
- Download URL: find_mfs-0.3.0-py3-none-any.whl
- Upload date:
- Size: 228.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb893ef99bf559b0b020940e4626436aff8f521def35555a12b59a8a96a22c71
|
|
| MD5 |
bfacb709daf007a384e12d8e5e50e93d
|
|
| BLAKE2b-256 |
aa9a9bf6533322e96d2c5e8aaff63bf3be779359b464c60ffcb3b92c290daa5a
|