Skip to main content

CartMFP: tools for database construction and formula prediction

Project description

CartMFP

CartMFP or Cartesian molecular formula prediction, is a python tool that can perform molecular formula predictions on custom databases.

How does CartMFP work

CartMFP consists of two steps.

  1. Constructing a database space2cart.py : A local database is constructed.
  2. Molecular formula prediction cart2form.py : Molecular formulas that are predicted based on input masses.

1. Constructing a database

A database is constructed by enumerating all combinations of elements within a certain range. The compositional space is described with a specific syntax: Element[min,max]. This can be any element in the periodic table for which the monoisotopic mass is described in the NIST database. "H[200]C[75]N[50]O[50]P[10]S[10]" is used default elemental space. This would eqaute to 0-200 Hydrogen, 0-75 Carbon, 0-50 Nitrogen, etc.

Apart from max element constraints, the elemental composition space is further limited by the maximum mass max_mass and ring double bond equivalents (RDBE) min_rdbe,max_rdbe.

Base chemical constraints:

Parameter Default value Description
-composition "H[200]C[75]N[50]O[50]P[10]S[10]" composition string describing minimum and maximum element counts
-max_mass 1000 maximum mass (Da)
-min_rdbe -5 minimum RDBE
-max_rdbe 80 maximum RDBE

Additional chemical constraints are provided by implementing some of Fiehn's 7 Golden rules, which filters unrealistic or impossible compositions. This can drastically reduce the size of your composition space. These include: Rule #2 – LEWIS and SENIOR check; Rule #4 – Hydrogen/Carbon element ratio check; Rule #5 heteroatom ratio check and Rule #6 – element probability check.

Parameter Default value Description
-filt_7gr True Toggle global to apply or remove 7 golden rules filtering
-filt_LewisSenior True Golden Rule #2: Filter compositions with non integer dbe (based on max valence)
-filt_ratios "HC[0.1,6]FC[0,6]ClC[0,2]BrC[0,2]NC[0,4]OC[0,3]PC[0,2]SC[0,3]SiC[0,1]" #Golden Rules #4,5: Filter on chemical ratios with extended range 99.9% coverage
-filt_NOPS True #6 – element probability check.

Additional arguments can be supplied to affect the performance and output paths:

Parameter Default value Description
-maxmem 10e9 Amount of memory used in bytes
-mass_blowup 100000 blowup factor to convert float masses to integers
-write_mass True construct a mass lookup table (faster MFP but larger database)
-Cartesian_output_folder "Cart_Output" Path to output folder
-Cartesian_output_file Output database name

Example use space2cart (In command line)

Contstruct default database:

python "space2cart.py" 

Contstruct database with halogens:

python "space2cart.py" -composition "H[200]C[75]N[50]O[50]P[10]S[10]F[5]Cl[5]I[3]Br[3]"

space2cart can also be executed by running the script in an IDE, such as Spyder.

2. Molecular formula prediction

After the composition database has been constructed with space2cart.py , molecular formula prediction can be done using cart2form.py. To run cart2form, an input mass list has to be supplied, which can be linked to a file in txt or any tabular format. The file either has to have a single column, or a column titled "Mz" or "Mass". Alternatively cart2form can be imported as a module within a script, and executed on a float mass or iterable set of masses with the function predict_formula(input_file,composition_file).

required inputs

Parameter Default value Description
-input_file "test_mass_CASMI2022.txt" path to list of masses
-composition_file "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy" path to the database composition file

Optional arguments can supplied to tune which massses will be returned, this includes the polariy, which adducts to consider, which charge states to consider. Other key arguments include the ppm tolerance of the mass error of returned composition, and the maximum number of compositions returned per mass. Adducts use the following syntax: "sign(+ addition of /- loss of) elemental composition charge(+/-)"

Parameter Default value Description
-mode "pos" ionization mode. Options: " ", "positive", "negative"
-adducts ["+H+","+Na+","+K", "+-","+Cl-","-H+"] default positive adducts "--","+H+","+Na+","+K", default negative adducts "+-","-H+","+Cl-"
-charges [1] Charge states to consider
-ppm 5 maximum mass error (ppm) of predicted compositions
-top_candidates 20 maxmimum number of compositions returned per mass

Example use cart2form

Molecular formula prediction from command line:

python "cart2form.py" -input_file "test_mass_CASMI2022.txt" -composition_file "H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy"

Alternatively cart2form.py can be imported.

import cart2form
cart2form.predict_formula(input_file=124.56 ,   #float mass or iterable (array/list/DataFrame)
                          composition_file="H[200]C[75]N[50]O[50]P[10]S[10]_b100000max1000rdbe-5_80_7gr_comp.npy")

Licensing

The pipeline is licensed with standard MIT-license.
If you would like to use this pipeline in your research, please cite the following papers:

Contact:

-Hugo Kleikamp (Developer): hugo.kleikamp@uantwerpen.be

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cartmfp-0.1.2.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cartmfp-0.1.2-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file cartmfp-0.1.2.tar.gz.

File metadata

  • Download URL: cartmfp-0.1.2.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for cartmfp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 14c1bba0497a5034a48cf9583d64da4a14ba7696f195c9cc93536e13cef9ccba
MD5 ff2adbd016c1d87444e5cf714a5d8e2f
BLAKE2b-256 0d050462182dc37f4b7e53f46aa99ec71aca75c96c540f72fb43e99b9df4b331

See more details on using hashes here.

File details

Details for the file cartmfp-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cartmfp-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for cartmfp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb0aa89ba9f06c309804acab3a7aa5cb2a6394d1246ce623401614a60e304595
MD5 10cec0d27e6a22c167ff1a13dfb7babb
BLAKE2b-256 80ec715b92147487d095a9722ecfcaf240994346c182ec3f1bf67e5f2af2d719

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page