A python toolkit for excitation-emission matrix (EEM) analysis
Project description
eempy
Author: Yongmin Hu (yongminhu@outlook.com)
Last update: 2026-01
This package provides tools for Excitation-Emission Matrix (EEM) fluorescence analysis, including data I/O, preprocessing, decomposition (PARAFAC/NMF with optional solvers/regularizations), and validation workflows.
Before jumping into coding
If you wish to do the analysis with an app without coding yourself, you are welcomed to try eempy-vis (https://github.com/YongminHu/eempy-vis). It provides prompt visualization, preprocessing and various EEM interpretation options with a clean UI.
Get Started
(English documentation: https://yongminhu.github.io/eempy/)
Installation
pip install eem-python
Read EEMs and absorbance from text files (.csv, .dat, .txt, etc.)
from pathlib import Path
from eempy.read_data import read_eem_dataset, read_abs_dataset
# ---------Read EEMs and absorbance from text files (.csv, .dat, .txt, etc.)-----------
data_dir = Path("tests") / "sample_data" # path to data folder
eem_stack, ex_range, em_range, indexes = read_eem_dataset(
str(data_dir),
# mandatory_keywords: all filenames must include these substrings
# optional_keywords: filenames must include at least one of these substrings
# For example, with:
# - mandatory_keywords=["PEM"]
# - optional_keywords=["2021-02-02", "2021-02-01"]
# only files whose filenames meet the following criteria are included:
# - filename must contain "PEM"
# - filename must also contain either "2021-02-01" or "2021-02-02"
mandatory_keywords=["PEM"],
optional_keywords=["2021-02-01", "2021-02-02"],
# file_first_row: "ex" means the first row lists excitation wavelengths in the text files.
# use "em" if the first row means emission wavelengths. If you are using .dat file
# generated from HORIBA aqualog, use "ex" by default.
file_first_row="ex",
# index_pos: (start, end) positions in filenames (1-based, end inclusive) used to extract
# sample labels from filenames. In the example, a string (something like
# "2021-02-01-1400_R1") would be extracted from position 1 to 19 for each sample. It would
# be used later in the output tables of EEM analysis.
index_pos=(1, 19)
)
abs_stack, ex_range_abs, _ = read_abs_dataset(
str(data_dir),
mandatory_keywords=["ABS"],
optional_keywords=["2021-02-01", "2021-02-02"],
)
Other data I/O helpers include: read_eem for a single EEM file, read_abs for a single absorbance spectrum,
read_eem_dataset_from_json for loading a saved EEMDataset from JSON, and read_reference_from_text for
1D reference variables (e.g., DOC) used in correlation or calibration workflows.
Build EEM dataset
from eempy.eem_processing import EEMDataset
# Using eem_stack, ex_range, em_range and indexes generated from above
dataset = EEMDataset(
eem_stack=eem_stack, # 3d array of EEMs
ex_range=ex_range,
em_range=em_range,
index=indexes, # sample labels
)
You can also attach reference data (ref) such as concentration tables and sample classes (cluster).
These labels propagate to outputs (e.g., Fmax tables) and enable correlation analysis and group-wise filtering.
Preprocessing: inner filter effect (IFE) correction, scattering removal, etc.
dataset = dataset.ife_correction(
# absorbance/ex_range_abs: absorbance spectra and their wavelength axis used for IFE
# correction
absorbance=abs_stack,
ex_range_abs=ex_range_abs,
inplace=False
)
dataset = dataset.rayleigh_scattering_removal(
# width_*: width of scattering bands to be interpolated for first-order and second-order
# scattering.
width_o1=10,
width_o2=10,
# interpolation_method_*: how to fill the masked Rayleigh bands ("zero", "linear",
# "cubic", "nan") for first-order and second-order scatterings.
interpolation_method_o1="zero",
interpolation_method_o2="zero",
inplace=False,
)
dataset = dataset.raman_scattering_removal(
width=5,
# interpolation_method: how to fill the Raman band ("zero", "linear", "cubic", "nan")
interpolation_method="zero",
interpolation_dimension="2d",
inplace=False,
)
dataset = dataset.cutting(
# ex_min/ex_max/em_min/em_max: wavelength window retained after cutting (nm)
ex_min=240,
ex_max=450,
em_min=300,
em_max=550,
inplace=False,
)
Additional preprocessing options include:
- threshold masking to clip extreme intensities,
- median/Gaussian filtering for noise suppression,
- NaN imputation to fill masked pixels,
- interpolation to a new excitation/emission grid,
- total-fluorescence or Raman normalization for intensity scaling.
Peak picking and regional integration
# peak_picking: target excitation/emission (nm); returns the closest grid point
peak_fi, ex_actual, em_actual = dataset.peak_picking(ex=350, em=450)
print(ex_actual, em_actual)
# regional_integration: sum of fluorescence over a rectangular ex/em region (nm)
ri = dataset.regional_integration(ex_min=250, ex_max=300, em_min=380, em_max=450)
print(peak_fi.head())
print(ri.head())
eempy also have functions to calculate other fluorescence indicators, including HIX, BIX, FI, AQY, and total fluorescence.
PARAFAC analysis
from eempy.eem_processing import PARAFAC
parafac = PARAFAC(
n_components=3, # n_components: number of components (rank)
solver="hals", # recommended solver
init="svd", # SVD-based initialization
# max_iter_als/tol: ALS iteration budget, influencing computation time/accuracy
max_iter_nnls=300,
max_iter_als=200,
random_state=0, # random seed
)
parafac.fit(dataset)
eempy offers different regularization options to potentially strengthen the accuracy and physical interpretability of the analysis:
-
Non-negativity
-
Elastic-net regularization on any factor (L1/L2 mix)
-
Quadratic priors on sample, excitation, or emission loadings.
This is useful when fitted scores or spectral components are desired to be close (but not necessarily identical) to prior knowledge. -
Ratio constraint on paired rows of sample scores:
score[idx_top] ≈ beta * score[idx_bot]This is useful when the ratios of component amplitudes between two sets of samples are desired to be constant.
For example, if each sample is measured both unquenched and quenched using a fixed quencher dosage, then for a given chemically consistent component the ratio between unquenched and quenched amplitudes may be approximately constant across samples (Hu et al., ES&T, 2025).In this case, passing the unquenched and quenched sample indices to
idx_topandidx_botencourages a constant ratio.
lamcontrols the strength of this regularization.
Split-half validation
from eempy.eem_processing import SplitValidation
validator = SplitValidation(
base_model=parafac,
n_splits=4, # 4 splits and 6 combinations
combination_size="half",
rule="random",
random_state=0,
)
validator.fit(dataset)
similarities = validator.compare_parafac_loadings()
print(similarities[0].head())
For NMF models, use compare_components() to assess component similarity across split-half models; PARAFAC
also supports excitation/emission loading comparisons via compare_parafac_loadings().
Other validation methods include variance explained and core consistency.
Outputs and visualization
from eempy.plot import plot_eem, plot_fi, plot_loadings, plot_fmax
# Compare a raw EEM and the processed EEM (sample 0)
plot_eem(
eem_stack[0],
ex_range,
em_range,
title="Raw EEM",
display=True
)
plot_eem(
dataset.eem_stack[0],
dataset.ex_range,
dataset.em_range,
title="Processed EEM",
display=True
)
# Visualize peak picking
plot_fi(peak_fi)
# Visualize PARAFAC loadings and Fmax
plot_loadings(
{"PARAFAC": parafac},
)
plot_fmax(parafac)
Other plotting helpers include EEM stack grids (plot_eem_stack), absorbance curves (plot_abs), and score plots
(plot_score). Most plotting functions support both matplotlib and plotly backends.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eem_python-1.5.tar.gz.
File metadata
- Download URL: eem_python-1.5.tar.gz
- Upload date:
- Size: 58.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f52e0faf50adc43a4cdda93806de101ad16cc0912d4b3a458a2b1ca7ad12f98
|
|
| MD5 |
46db22448592f868b4d6cae36c368deb
|
|
| BLAKE2b-256 |
ef7ff84c7bd5f60fc1e1380ba1d8f09672968496c55150b8c335184a45aa8595
|
File details
Details for the file eem_python-1.5-py3-none-any.whl.
File metadata
- Download URL: eem_python-1.5-py3-none-any.whl
- Upload date:
- Size: 74.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bbbf8a8ef9b78671f8d49194619a8f8fac3149d389ee381be8d3f19b57ee6a6
|
|
| MD5 |
063be6960f3727f75722c3cb5f9ca1be
|
|
| BLAKE2b-256 |
6749c5b90d25db7a9df6127d81f8d5bebf53a637ee57ca0ebc5bc23684586365
|