Skip to main content

Automated Targeted Feature Extraction & Adduct Verification Tool for LC-MS Data.

Project description

LCMS Adduct Finder

Automated Targeted Feature Extraction & Adduct Verification Tool for LC-MS Data.

This Python tool is designed for the targeted analysis of LC-MS data. By providing a list of Chemical Formulas, it automatically performs a comprehensive scan for various adduct forms (e.g., [M+H]+, [M+Na]+). It extracts Extracted Ion Chromatograms (EIC) and rigorously evaluates peak quality using Gaussian fitting to determine the reliability of the detected signals.


Key Features

  • Targeted Extraction: Instantly converts chemical formulas (e.g., C6H12O6) into target m/z values, enabling precise extraction of specific metabolites or compounds.
  • Multi-Adduct Verification:
    • Automatically scans for 14+ different adduct types (Monomers, Dimers, Na/NH4 adducts, etc.) simultaneously.
    • Helps confirm the identity of a substance by checking if multiple adducts elute at the same Retention Time (RT).
  • Peak Quality & Existence Check:
    • Gaussian Scoring: Fits a Gaussian curve to the raw peak data and calculates an R² score.
    • Distinguishes high-quality peaks ("Excellent/Good") from noise or irregular shapes ("Poor/Noise").
  • Precision Mass Calculation: Uses high-precision logic considering electron mass: $$m/z = \frac{(M \times n + \Delta) - (Charge \times m_e)}{|Charge|}$$
  • Visual Inspection (Optional): Saves EIC plots as PNG images with Gaussian fit overlay.
  • MS2 Matching: Links MS1 features to MS2 events for additional confirmation.

Installation

From PyPI (recommended)

pip install lib_eic

With YAML configuration support:

pip install lib_eic[yaml]

From Source

git clone https://github.com/SNUFML/lib_eic.git
cd lib_eic
pip install -e .

Dependencies

  • pandas, openpyxl (Excel I/O)
  • molmass (mass calculations)
  • scipy, numpy (numerical processing)
  • matplotlib (plotting)
  • fisher-py, pythonnet (Thermo .raw file reading)

Supported Adducts

The tool automatically detects the ionization mode (Positive/Negative) and scans for the following adducts:

Mode Adduct Types
Positive (+) [M+H]+, [M+Na]+, [M+NH4]+, [M+ACN+H]+, [2M+H]+, [M-H2O+H]+, etc.
Negative (-) [M-H]-, [M+FA-H]-, [M-H2O-H]-, [2M-H]-, etc.

Usage

Quick Start

  1. Place your Thermo .raw files in ./raw folder
    • Nested layouts are supported, e.g. ./raw/{RP,HILIC}/{1st,2nd}/*.raw
  2. Create an input Excel file (file_list.xlsx) with your compound list
  3. Run the tool:
lib_eic
# or
python -m lib_eic

Input Excel Format

lib_eic supports two input formats (auto-detected by column names).

A) Direct m/z format (recommended for EIC plot generation)

The Excel file contains separate sheets for chromatography modes:

  • RP (Reverse Phase)
  • HILIC (Hydrophilic Interaction Liquid Chromatography)

The Excel file may contain merged cells in row 1; headers/data start from row 2.

num File name mixture Compound name Polarity m/z
1 Library_POS_Mix121 121 Spermine POS 203.223
2 Library_POS_Mix121 121 Putrescine POS 89.107
3 Library_NEG_Mix121 121 Glucose NEG 179.056
  • num: Optional ordering number (used to prefix plot filenames for easier sorting)
  • File name: Partial raw filename prefix used for matching (e.g., matches File name.raw, File name_2nd.raw, ...)
  • mixture: Mixture identifier (used in plot filenames)
  • Compound name: Display name for plots and Excel output
  • Polarity: POS or NEG
  • m/z: Direct target m/z value

EIC plots are saved under: EIC_Plots_Export/{LC mode}/{Polarity}/{File name}/[{num}_]{Compound name}_{Polarity}_{mixture}{suffix}.png

Notes:

  • For direct m/z input (separate RP/HILIC sheets), if --raw-folder contains an {LC mode} subfolder, the tool searches that first.
  • If raw files are further split by run folders (e.g. 1st/, 2nd/), the run label is carried into the output (and plot filenames) to avoid overwrites.

B) Formula-based format (legacy)

RawFile Mode Formula
sample_01.raw POS C6H12O6
sample_01.raw POS C10H16N5O13P3
sample_02.raw NEG C6H12O6
  • RawFile: Filename (extension can be omitted; .raw is appended if missing)
  • Mode: POS or NEG (uppercase)
  • Formula: Chemical formula to analyze

Configuration

Option 1: Command Line Arguments

lib_eic --raw-folder ./my_raw_files --input compounds.xlsx --output results.xlsx --ppm 10.0 -v

Common options:

  • --raw-folder: Path to folder containing .raw files (default: ./raw)
  • --input: Input Excel file path (default: file_list.xlsx)
  • --output: Output Excel file path (default: Final_Result_With_Plots.xlsx)
  • --no-pivots: Disable per-target pivot table sheets (faster for large runs)
  • --ppm: Mass tolerance in ppm (default: 10.0)
  • --no-plots: Disable EIC plot generation
  • --no-fitting: Disable Gaussian fitting
  • --no-ms2: Disable MS2 indexing/matching
  • --workers N: Number of worker processes (default: auto; use 1 for sequential)
  • --sequential: Force sequential processing (equivalent to --workers 1)
  • --no-progress: Disable the tqdm progress bar
  • -v, --verbose: Enable verbose output
  • --help: Show all available options

Performance notes:

  • Parallelism is file-level (one worker per raw file); if you only have 1 raw file, speedup is limited.
  • If CPU usage stays low, the run is likely bottlenecked by disk I/O (.raw reads) or output writing (Excel/plots); try fewer workers and/or an SSD, and consider --no-plots/--no-pivots.

Option 2: YAML Configuration File

Generate a default configuration template:

lib_eic --generate-config config.yaml

Then edit and use:

lib_eic --config config.yaml

Example config.yaml:

raw_data_folder: "./raw"
input_excel: "file_list.xlsx"
input_sheets: ["RP", "HILIC"]
output_excel: "Final_Result_With_Plots.xlsx"
include_pivot_tables: true
show_progress: true
num_workers: 0          # 0 = auto, 1 = sequential, N = N workers
parallel_mode: "auto"   # "auto", "sequential", "file" (file-level multiprocessing)
ppm_tolerance: 10.0
min_peak_intensity: 100000
enable_fitting: true
enable_plotting: true
enable_ms2: true
export_plot_folder: "EIC_Plots_Export"
area_method: "sum"  # or "trapz"
ms2_match_mode: "rt_linked"  # or "global"

Option 3: Python API

from lib_eic.config import Config
from lib_eic.processor import process_all

# Create configuration
config = Config(
    raw_data_folder="./raw",
    input_excel="file_list.xlsx",
    output_excel="results.xlsx",
    ppm_tolerance=10.0,
    enable_plotting=True,
    enable_fitting=True
)

# Run analysis
process_all(config)

Advanced: Direct Module Access

from lib_eic.chemistry.mass import get_exact_mass, calculate_target_mz
from lib_eic.chemistry.adducts import get_enabled_adducts
from lib_eic.analysis.eic import build_targets, extract_eic
from lib_eic.analysis.fitting import fit_gaussian_and_score
from lib_eic.io.raw_file import RawFileReader

# Calculate exact mass
mass = get_exact_mass("C6H12O6")  # ~180.0634

# Get enabled adducts for positive mode
adducts = get_enabled_adducts("POS")

# Build targets from formulas
targets = build_targets(["C6H12O6"], adducts)

# Read raw file and extract EIC
with RawFileReader("sample.raw") as reader:
    rt, intensity = reader.get_chromatogram(target_mz=181.0707, ppm=10.0)

Output

1. Excel Report (Final_Result_With_Plots.xlsx)

  • All_Features Sheet: Complete results table

    • Formula-based: RawFile, Mode, Formula, Adduct, mz_theoretical, RT_min, Intensity, Area, GaussianScore, PeakQuality, HasMS2
    • Direct m/z: RawFile, File name, mixture, Compound name, Polarity, mz_target, RT_min, Intensity, Area, GaussianScore, PeakQuality, HasMS2
  • Target_Status Sheet: Per-target processing status table (includes targets that were not reported as features)

    • Adds EICGenerated (whether chromatogram extraction returned data) and FilteredOut (below --min-intensity)
    • Helps identify compounds present in the input Excel that failed EIC extraction or were excluded by filtering
  • Per-Target Sheets: Pivot tables for each Formula / Compound name

    • Area Table: Peak areas across samples and adducts
    • Retention Time Table: RT consistency verification across adducts

2. EIC Plots (EIC_Plots_Export/)

Visual validation of detected peaks (when plotting is enabled):

  • Direct m/z: EIC_Plots_Export/{Polarity}/{File name}/{Compound name}_{Polarity}_{mixture}{suffix}.png
  • Blue Line: Raw EIC data
  • Red Marker: Apex RT indicator
  • Dashed Line: Gaussian fit curve (when fitting is enabled)

Quality Scoring System

The Gaussian fitting provides an R² score to assess peak quality:

R² Score Label Interpretation
> 0.8 Excellent High-quality Gaussian peak, reliable signal
0.5-0.8 Good Adequate fit, signal is likely real
< 0.5 Poor Irregular shape, may be artifact or noise
N/A Not Fitted Fitting was disabled

Testing

Run the test suite:

pytest tests/
pytest -v tests/  # verbose output
pytest --cov=lib_eic tests/  # with coverage report

Project Structure

lib_eic/
├── chemistry/          # Mass calculations and adduct definitions
│   ├── mass.py         # Exact mass and m/z calculations
│   └── adducts.py      # Adduct type definitions
├── analysis/           # Core analysis modules
│   ├── eic.py          # EIC extraction and target building
│   ├── fitting.py      # Gaussian peak fitting
│   └── ms2.py          # MS2 precursor matching
├── io/                 # Input/output operations
│   ├── raw_file.py     # Thermo .raw file reader
│   ├── excel.py        # Excel I/O
│   └── plotting.py     # EIC visualization
├── config.py           # Configuration management
├── processor.py        # Main processing pipeline
├── cli.py              # Command-line interface
└── validation.py       # Input validation

Limitations

  • Supported File Format: Thermo .raw files only (via fisher-py)
  • mzML Not Supported: Use vendor-specific raw files for best results
  • Platform: Requires .NET runtime (Windows native, or Mono on Linux/macOS)

License

This project is licensed under the MIT License - see the LICENSE file for details.


Author

Jihyun Chun (jihyun5311@snu.ac.kr)

Repository: https://github.com/SNUFML/lib_eic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lib_eic-0.2.1.tar.gz (317.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lib_eic-0.2.1-py3-none-any.whl (47.5 kB view details)

Uploaded Python 3

File details

Details for the file lib_eic-0.2.1.tar.gz.

File metadata

  • Download URL: lib_eic-0.2.1.tar.gz
  • Upload date:
  • Size: 317.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for lib_eic-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2f1c805f91509136b5633ba7004f530c2157f4e58cf9d3001b6b68f8afa23796
MD5 d1d2eca622e61810addaec58fb03010f
BLAKE2b-256 662125b331212ca259cf205e3bc562532431dbafc043c29769f3df5653f808c2

See more details on using hashes here.

File details

Details for the file lib_eic-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: lib_eic-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 47.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for lib_eic-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b457931eb29c6d1fc8750e25c797eae4ec3b4ffa831c048f0bb2cb9ecd001aaa
MD5 cc32f5c25906298bd245bce8be85ba89
BLAKE2b-256 db850cbc53206d8dfa6808db6772a6af198a36f170d5be0b3e65f0f9ee6b2fc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page