A python toolkit for excitation-emission matrix (EEM) analysis

Project description

eempy

Author: Yongmin Hu (yongminhu@outlook.com)

Last update: 2026-01

This package provides tools for Excitation-Emission Matrix (EEM) fluorescence analysis, including data I/O, preprocessing, decomposition (PARAFAC/NMF with optional solvers/regularizations), and validation workflows.

Before jumping into coding

If you wish to do the analysis with an app without coding yourself, you are welcomed to try eempy-vis (https://github.com/YongminHu/eempy-vis). It provides prompt visualization, preprocessing and various EEM interpretation options with a clean UI.

Get Started

(English documentation: https://yongminhu.github.io/eempy/)

Installation

pip install eem-python

Read EEMs and absorbance from text files (.csv, .dat, .txt, etc.)

from pathlib import Path
from eempy.read_data import read_eem_dataset, read_abs_dataset

# ---------Read EEMs and absorbance from text files (.csv, .dat, .txt, etc.)-----------

data_dir = Path("tests") / "sample_data" # path to data folder
eem_stack, ex_range, em_range, indexes = read_eem_dataset(
    str(data_dir),
    # mandatory_keywords: all filenames must include these substrings
    # optional_keywords: filenames must include at least one of these substrings
    # For example, with:
    # - mandatory_keywords=["PEM"]
    # - optional_keywords=["2021-02-02", "2021-02-01"]
    # only files whose filenames meet the following criteria are included:
    # - filename must contain "PEM"
    # - filename must also contain either "2021-02-01" or "2021-02-02"
    mandatory_keywords=["PEM"],
    optional_keywords=["2021-02-01", "2021-02-02"],
    # file_first_row: "ex" means the first row lists excitation wavelengths in the text files.
    # use "em" if the first row means emission wavelengths. If you are using .dat file
    # generated from HORIBA aqualog, use "ex" by default.
    file_first_row="ex",
    # index_pos: (start, end) positions in filenames (1-based, end inclusive) used to extract
    # sample labels from filenames. In the example, a string (something like 
    # "2021-02-01-1400_R1") would be extracted from position 1 to 19 for each sample. It would
    # be used later in the output tables of EEM analysis.  
    index_pos=(1, 19)
)
abs_stack, ex_range_abs, _ = read_abs_dataset(
    str(data_dir),
    mandatory_keywords=["ABS"],
    optional_keywords=["2021-02-01", "2021-02-02"],
)

Other data I/O helpers include: read_eem for a single EEM file, read_abs for a single absorbance spectrum, read_eem_dataset_from_json for loading a saved EEMDataset from JSON, and read_reference_from_text for 1D reference variables (e.g., DOC) used in correlation or calibration workflows.

Build EEM dataset

from eempy.eem_processing import EEMDataset

# Using eem_stack, ex_range, em_range and indexes generated from above
dataset = EEMDataset(
    eem_stack=eem_stack, # 3d array of EEMs
    ex_range=ex_range,
    em_range=em_range,
    index=indexes, # sample labels
)

You can also attach reference data (ref) such as concentration tables and sample classes (cluster). These labels propagate to outputs (e.g., Fmax tables) and enable correlation analysis and group-wise filtering.

Preprocessing: inner filter effect (IFE) correction, scattering removal, etc.

dataset = dataset.ife_correction(
    # absorbance/ex_range_abs: absorbance spectra and their wavelength axis used for IFE
    # correction
    absorbance=abs_stack, 
    ex_range_abs=ex_range_abs, 
    inplace=False
    )
dataset = dataset.rayleigh_scattering_removal(
    # width_*: width of scattering bands to be interpolated for first-order and second-order
    # scattering.
    width_o1=10,
    width_o2=10,
    # interpolation_method_*: how to fill the masked Rayleigh bands ("zero", "linear",
    # "cubic", "nan") for first-order and second-order scatterings.
    interpolation_method_o1="zero",
    interpolation_method_o2="zero",
    inplace=False,
)
dataset = dataset.raman_scattering_removal(
    width=5,
    # interpolation_method: how to fill the Raman band ("zero", "linear", "cubic", "nan")
    interpolation_method="zero",
    interpolation_dimension="2d",
    inplace=False,
)
dataset = dataset.cutting(
    # ex_min/ex_max/em_min/em_max: wavelength window retained after cutting (nm)
    ex_min=240,
    ex_max=450,
    em_min=300,
    em_max=550,
    inplace=False,
)

Additional preprocessing options include:

threshold masking to clip extreme intensities,
median/Gaussian filtering for noise suppression,
NaN imputation to fill masked pixels,
interpolation to a new excitation/emission grid,
total-fluorescence or Raman normalization for intensity scaling.

Peak picking and regional integration

# peak_picking: target excitation/emission (nm); returns the closest grid point
peak_fi, ex_actual, em_actual = dataset.peak_picking(ex=350, em=450)
print(ex_actual, em_actual)
# regional_integration: sum of fluorescence over a rectangular ex/em region (nm)
ri = dataset.regional_integration(ex_min=250, ex_max=300, em_min=380, em_max=450)
print(peak_fi.head())
print(ri.head())

eempy also have functions to calculate other fluorescence indicators, including HIX, BIX, FI, AQY, and total fluorescence.

PARAFAC analysis

from eempy.eem_processing import PARAFAC

parafac = PARAFAC(
    n_components=3, # n_components: number of components (rank)
    solver="hals", # recommended solver
    init="svd", # SVD-based initialization
    # max_iter_als/tol: ALS iteration budget, influencing computation time/accuracy
    max_iter_nnls=300,
    max_iter_als=200,
    random_state=0, # random seed
)
parafac.fit(dataset)

eempy offers different regularization options to potentially strengthen the accuracy and physical interpretability of the analysis:

Non-negativity
Elastic-net regularization on any factor (L1/L2 mix)
Quadratic priors on sample, excitation, or emission loadings.
This is useful when fitted scores or spectral components are desired to be close (but not necessarily identical) to prior knowledge.
Ratio constraint on paired rows of sample scores:
score[idx_top] ≈ beta * score[idx_bot]

This is useful when the ratios of component amplitudes between two sets of samples are desired to be constant.
For example, if each sample is measured both unquenched and quenched using a fixed quencher dosage, then for a given chemically consistent component the ratio between unquenched and quenched amplitudes may be approximately constant across samples (Hu et al., ES&T, 2025).

In this case, passing the unquenched and quenched sample indices to idx_top and idx_bot encourages a constant ratio.
lam controls the strength of this regularization.

Split-half validation

from eempy.eem_processing import SplitValidation

validator = SplitValidation(
    base_model=parafac,
    n_splits=4, # 4 splits and 6 combinations
    combination_size="half",
    rule="random",
    random_state=0,
)
validator.fit(dataset)
similarities = validator.compare_parafac_loadings()
print(similarities[0].head())

For NMF models, use compare_components() to assess component similarity across split-half models; PARAFAC also supports excitation/emission loading comparisons via compare_parafac_loadings().

Other validation methods include variance explained and core consistency.

Outputs and visualization

from eempy.plot import plot_eem, plot_fi, plot_loadings, plot_fmax

# Compare a raw EEM and the processed EEM (sample 0)
plot_eem(
    eem_stack[0], 
    ex_range, 
    em_range, 
    title="Raw EEM", 
    display=True
    )
plot_eem(
    dataset.eem_stack[0], 
    dataset.ex_range, 
    dataset.em_range, 
    title="Processed EEM", 
    display=True
    )

# Visualize peak picking
plot_fi(peak_fi)

# Visualize PARAFAC loadings and Fmax
plot_loadings(
    {"PARAFAC": parafac}, 
    )
plot_fmax(parafac)

Other plotting helpers include EEM stack grids (plot_eem_stack), absorbance curves (plot_abs), and score plots (plot_score). Most plotting functions support both matplotlib and plotly backends.

Project details

Release history Release notifications | RSS feed

1.22 yanked

Mar 27, 2024

1.21 yanked

Mar 19, 2024

This version

1.5

Jan 27, 2026

1.3

Jul 3, 2024

1.2 yanked

Feb 29, 2024

1.1 yanked

Feb 14, 2024

1.0 yanked

Feb 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eem_python-1.5.tar.gz (58.1 MB view details)

Uploaded Jan 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

eem_python-1.5-py3-none-any.whl (74.1 MB view details)

Uploaded Jan 27, 2026 Python 3

File details

Details for the file eem_python-1.5.tar.gz.

File metadata

Download URL: eem_python-1.5.tar.gz
Upload date: Jan 27, 2026
Size: 58.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for eem_python-1.5.tar.gz
Algorithm	Hash digest
SHA256	`7f52e0faf50adc43a4cdda93806de101ad16cc0912d4b3a458a2b1ca7ad12f98`
MD5	`46db22448592f868b4d6cae36c368deb`
BLAKE2b-256	`ef7ff84c7bd5f60fc1e1380ba1d8f09672968496c55150b8c335184a45aa8595`

See more details on using hashes here.

File details

Details for the file eem_python-1.5-py3-none-any.whl.

File metadata

Download URL: eem_python-1.5-py3-none-any.whl
Upload date: Jan 27, 2026
Size: 74.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for eem_python-1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8bbbf8a8ef9b78671f8d49194619a8f8fac3149d389ee381be8d3f19b57ee6a6`
MD5	`063be6960f3727f75722c3cb5f9ca1be`
BLAKE2b-256	`6749c5b90d25db7a9df6127d81f8d5bebf53a637ee57ca0ebc5bc23684586365`

See more details on using hashes here.

eem-python 1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

eempy

Before jumping into coding

Get Started

Installation

Read EEMs and absorbance from text files (.csv, .dat, .txt, etc.)

Build EEM dataset

Preprocessing: inner filter effect (IFE) correction, scattering removal, etc.

Peak picking and regional integration

PARAFAC analysis

Split-half validation

Outputs and visualization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes