Automated Targeted Feature Extraction & Adduct Verification Tool for LC-MS Data.
Project description
LCMS Adduct Finder
Automated Targeted Feature Extraction & Adduct Verification Tool for LC-MS Data.
This Python tool is designed for the targeted analysis of LC-MS data. By providing a list of Chemical Formulas,
it automatically performs a comprehensive scan for various adduct forms (e.g., [M+H]+, [M+Na]+). It extracts
Extracted Ion Chromatograms (EIC) and rigorously evaluates peak quality using Gaussian fitting to determine the
reliability of the detected signals.
Key Features
- Targeted Extraction: Instantly converts chemical formulas (e.g.,
C6H12O6) into target m/z values, enabling precise extraction of specific metabolites or compounds. - Multi-Adduct Verification:
- Automatically scans for 14+ different adduct types (Monomers, Dimers, Na/NH4 adducts, etc.) simultaneously.
- Helps confirm the identity of a substance by checking if multiple adducts elute at the same Retention Time (RT).
- Peak Quality & Existence Check:
- Gaussian Scoring: Fits a Gaussian curve to the raw peak data and calculates an R² score.
- Distinguishes high-quality peaks ("Excellent/Good") from noise or irregular shapes ("Poor/Noise").
- Precision Mass Calculation: Uses high-precision logic considering electron mass: $$m/z = \frac{(M \times n + \Delta) - (Charge \times m_e)}{|Charge|}$$
- Visual Inspection (Optional): Saves EIC plots as PNG images with Gaussian fit overlay.
- MS2 Matching: Links MS1 features to MS2 events for additional confirmation.
Installation
From PyPI (recommended)
pip install lib_eic
With YAML configuration support:
pip install lib_eic[yaml]
From Source
git clone https://github.com/SNUFML/lib_eic.git
cd lib_eic
pip install -e .
Dependencies
- pandas, openpyxl (Excel I/O)
- molmass (mass calculations)
- scipy, numpy (numerical processing)
- matplotlib (plotting)
- fisher-py, pythonnet (Thermo .raw file reading)
Supported Adducts
The tool automatically detects the ionization mode (Positive/Negative) and scans for the following adducts:
| Mode | Adduct Types |
|---|---|
| Positive (+) | [M+H]+, [M+Na]+, [M+NH4]+, [M+ACN+H]+, [2M+H]+, [M-H2O+H]+, etc. |
| Negative (-) | [M-H]-, [M+FA-H]-, [M-H2O-H]-, [2M-H]-, etc. |
Usage
Quick Start
- Place your Thermo
.rawfiles in./rawfolder- Nested layouts are supported, e.g.
./raw/{RP,HILIC}/{1st,2nd}/*.raw
- Nested layouts are supported, e.g.
- Create an input Excel file (
file_list.xlsx) with your compound list - Run the tool:
lib_eic
# or
python -m lib_eic
Input Excel Format
lib_eic supports two input formats (auto-detected by column names).
A) Direct m/z format (recommended for EIC plot generation)
The Excel file contains separate sheets for chromatography modes:
RP(Reverse Phase)HILIC(Hydrophilic Interaction Liquid Chromatography)
The Excel file may contain merged cells in row 1; headers/data start from row 2.
| num | File name | mixture | Compound name | Polarity | m/z |
|---|---|---|---|---|---|
| 1 | Library_POS_Mix121 |
121 | Spermine | POS | 203.223 |
| 2 | Library_POS_Mix121 |
121 | Putrescine | POS | 89.107 |
| 3 | Library_NEG_Mix121 |
121 | Glucose | NEG | 179.056 |
- num: Optional ordering number (used to prefix plot filenames for easier sorting)
- File name: Partial raw filename prefix used for matching (e.g., matches
File name.raw,File name_2nd.raw, ...) - mixture: Mixture identifier (used in plot filenames)
- Compound name: Display name for plots and Excel output
- Polarity:
POSorNEG - m/z: Direct target m/z value
EIC plots are saved under:
EIC_Plots_Export/{LC mode}/{Polarity}/{File name}/[{num}_]{Compound name}_{Polarity}_{mixture}{suffix}.png
Notes:
- For direct m/z input (separate
RP/HILICsheets), if--raw-foldercontains an{LC mode}subfolder, the tool searches that first. - If raw files are further split by run folders (e.g.
1st/,2nd/), the run label is carried into the output (and plot filenames) to avoid overwrites.
B) Formula-based format (legacy)
| RawFile | Mode | Formula |
|---|---|---|
sample_01.raw |
POS | C6H12O6 |
sample_01.raw |
POS | C10H16N5O13P3 |
sample_02.raw |
NEG | C6H12O6 |
- RawFile: Filename (extension can be omitted;
.rawis appended if missing) - Mode:
POSorNEG(uppercase) - Formula: Chemical formula to analyze
Configuration
Option 1: Command Line Arguments
lib_eic --raw-folder ./my_raw_files --input compounds.xlsx --output results.xlsx --ppm 10.0 -v
Common options:
--raw-folder: Path to folder containing .raw files (default:./raw)--input: Input Excel file path (default:file_list.xlsx)--output: Output Excel file path (default:Final_Result_With_Plots.xlsx)--ppm: Mass tolerance in ppm (default:10.0)--no-plots: Disable EIC plot generation--no-fitting: Disable Gaussian fitting--no-ms2: Disable MS2 indexing/matching-v, --verbose: Enable verbose output--help: Show all available options
Option 2: YAML Configuration File
Generate a default configuration template:
lib_eic --generate-config config.yaml
Then edit and use:
lib_eic --config config.yaml
Example config.yaml:
raw_data_folder: "./raw"
input_excel: "file_list.xlsx"
input_sheets: ["RP", "HILIC"]
output_excel: "Final_Result_With_Plots.xlsx"
ppm_tolerance: 10.0
min_peak_intensity: 100000
enable_fitting: true
enable_plotting: true
enable_ms2: true
export_plot_folder: "EIC_Plots_Export"
area_method: "sum" # or "trapz"
ms2_match_mode: "rt_linked" # or "global"
Option 3: Python API
from lib_eic.config import Config
from lib_eic.processor import process_all
# Create configuration
config = Config(
raw_data_folder="./raw",
input_excel="file_list.xlsx",
output_excel="results.xlsx",
ppm_tolerance=10.0,
enable_plotting=True,
enable_fitting=True
)
# Run analysis
process_all(config)
Advanced: Direct Module Access
from lib_eic.chemistry.mass import get_exact_mass, calculate_target_mz
from lib_eic.chemistry.adducts import get_enabled_adducts
from lib_eic.analysis.eic import build_targets, extract_eic
from lib_eic.analysis.fitting import fit_gaussian_and_score
from lib_eic.io.raw_file import RawFileReader
# Calculate exact mass
mass = get_exact_mass("C6H12O6") # ~180.0634
# Get enabled adducts for positive mode
adducts = get_enabled_adducts("POS")
# Build targets from formulas
targets = build_targets(["C6H12O6"], adducts)
# Read raw file and extract EIC
with RawFileReader("sample.raw") as reader:
rt, intensity = reader.get_chromatogram(target_mz=181.0707, ppm=10.0)
Output
1. Excel Report (Final_Result_With_Plots.xlsx)
-
All_Features Sheet: Complete results table
- Formula-based: RawFile, Mode, Formula, Adduct, mz_theoretical, RT_min, Intensity, Area, GaussianScore, PeakQuality, HasMS2
- Direct m/z: RawFile, File name, mixture, Compound name, Polarity, mz_target, RT_min, Intensity, Area, GaussianScore, PeakQuality, HasMS2
-
Target_Status Sheet: Per-target processing status table (includes targets that were not reported as features)
- Adds
EICGenerated(whether chromatogram extraction returned data) andFilteredOut(below--min-intensity) - Helps identify compounds present in the input Excel that failed EIC extraction or were excluded by filtering
- Adds
-
Per-Target Sheets: Pivot tables for each Formula / Compound name
- Area Table: Peak areas across samples and adducts
- Retention Time Table: RT consistency verification across adducts
2. EIC Plots (EIC_Plots_Export/)
Visual validation of detected peaks (when plotting is enabled):
- Direct m/z:
EIC_Plots_Export/{Polarity}/{File name}/{Compound name}_{Polarity}_{mixture}{suffix}.png - Blue Line: Raw EIC data
- Red Marker: Apex RT indicator
- Dashed Line: Gaussian fit curve (when fitting is enabled)
Quality Scoring System
The Gaussian fitting provides an R² score to assess peak quality:
| R² Score | Label | Interpretation |
|---|---|---|
| > 0.8 | Excellent | High-quality Gaussian peak, reliable signal |
| 0.5-0.8 | Good | Adequate fit, signal is likely real |
| < 0.5 | Poor | Irregular shape, may be artifact or noise |
| N/A | Not Fitted | Fitting was disabled |
Testing
Run the test suite:
pytest tests/
pytest -v tests/ # verbose output
pytest --cov=lib_eic tests/ # with coverage report
Project Structure
lib_eic/
├── chemistry/ # Mass calculations and adduct definitions
│ ├── mass.py # Exact mass and m/z calculations
│ └── adducts.py # Adduct type definitions
├── analysis/ # Core analysis modules
│ ├── eic.py # EIC extraction and target building
│ ├── fitting.py # Gaussian peak fitting
│ └── ms2.py # MS2 precursor matching
├── io/ # Input/output operations
│ ├── raw_file.py # Thermo .raw file reader
│ ├── excel.py # Excel I/O
│ └── plotting.py # EIC visualization
├── config.py # Configuration management
├── processor.py # Main processing pipeline
├── cli.py # Command-line interface
└── validation.py # Input validation
Limitations
- Supported File Format: Thermo
.rawfiles only (via fisher-py) - mzML Not Supported: Use vendor-specific raw files for best results
- Platform: Requires .NET runtime (Windows native, or Mono on Linux/macOS)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Jihyun Chun (jihyun5311@snu.ac.kr)
Repository: https://github.com/SNUFML/lib_eic
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lib_eic-0.1.0.tar.gz.
File metadata
- Download URL: lib_eic-0.1.0.tar.gz
- Upload date:
- Size: 308.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f8f961655de4d58a5826b27170854b0c33f25f5ce8949dcce0954f1d4c3e5dd
|
|
| MD5 |
90657e70f3384cc16e453364758b2439
|
|
| BLAKE2b-256 |
6494ec5e3cf4bcaf167cc1421cf6b098ae1b1cc0250e7d612d811045aa567139
|
File details
Details for the file lib_eic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lib_eic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b5c8d9037a5a32af005ae3dabbd6accb5d7bcde3640a29ddbdde4264331ee24
|
|
| MD5 |
0d0e108cda3b2033f4be68b7a0f70e58
|
|
| BLAKE2b-256 |
b22a122b97642badde2a4e9dc58b5305aee67faf824505ca09bc0a4c4389285a
|