A python library for spectral-zone-level explanations in machine learning models trained on spectral data (XRF, GRSm Raman, etc.)

These details have not been verified by PyPI

Project links

Project description

SMX

SMX Logo

This is the official repository for the spectral-model-explainer (SMX) library, an eXplainable AI tool designed to provide explanations for machine learning models trained on spectral data (e.g., XRF, GRS, Raman, and related modalities).

SMX is a post-hoc, global, model-agnostic framework that explains spectral-based ML classifiers directly in terms of expert-informed spectral zones. It aggregates each zone via PCA, formulates quantile-based logical predicates, estimates their relevance through perturbation experiments within stochastic subsamples, and integrates the results into a directed weighted graph whose global structure is summarized by Local Reaching Centrality. A distinctive feature is threshold spectrum reconstruction, which back-projects each predicate's decision boundary into the original spectral domain in natural measurement units, enabling practitioners to visually compare their spectra against the model-related boundaries.

Method Overview in the Library

The high-level workflow is implemented in the SMX pipeline class and can also be executed component-by-component through the public API:

spectral zone extraction
zone aggregation (typically PCA-based)
predicate generation from quantiles
bagging-based robustness evaluation
predicate relevance scoring
directed graph construction
centrality-based ranking and optional mapping back to natural scale

This implementation allows both:

end-to-end execution through a single pipeline object
advanced control through direct use of dedicated classes/functions

Spectral Zone Construction

The method starts by partitioning the spectral axis into zones using extract_spectral_zones. Input spectra are expected as a DataFrame in which columns represent numeric spectral positions (energies, wavelengths, channels, etc.).

How zones must be provided

The cuts argument accepts multiple valid formats:

(start, end)
(name, start, end)
(name, start, end, group)
{name, start, end}
{name, start, end, group}

Important behavior:

boundaries are interpreted numerically and inclusively
if start > end, the library automatically reorders them
grouped cuts (same group) are concatenated into one merged zone
non-grouped cuts are kept as independent zones

This flexibility enables both physically meaningful elemental regions and composite regions such as aggregated background segments.

Predicate Construction from Zone Scores

After extraction, each zone is transformed into one scalar score per sample (default strategy: PC1 score via ZoneAggregator(method="pca")). These zone-level summaries are the basis for predicate generation.

PredicateGenerator creates binary threshold predicates from a user-defined set of quantiles. For each zone and each quantile value q, two complementary predicates are produced:

zone <= threshold(q)
zone > threshold(q)

Therefore, if k quantiles are provided, the initial candidate set is 2k predicates per zone (before duplicate removal). Duplicate rules are automatically removed when quantiles collapse to identical threshold values.

Bagging and Robustness Hyperparameters

SMX estimates predicate robustness through repeated bagging cycles. In the high-level pipeline, this is controlled primarily by:

n_bags: number of bags generated per repetition (seed)
n_repetitions: number of independent repetitions (seed loop)
n_samples_fraction: fraction of samples drawn in each bag
quantiles: quantile grid that defines predicate thresholds

Operationally:

each repetition creates a new random context for bag generation
each bag evaluates which predicates are sufficiently supported by sampled data
predicates with very low support in a bag are discarded for that bag
final rankings are aggregated across valid repetitions to reduce seed sensitivity

This design makes the explanation less dependent on a single random split and more representative of stable decision behavior.

Predicate Relevance and Graph Construction

Within each bag, predicates are ranked by an importance metric based on perturbation experiments:

perturbation-based relevance (PerturbationMetric), using a fitted estimator

PredicateGraphBuilder then constructs a directed graph from ranked predicates:

consecutive predicates in a ranking induce directed edges
edge weights are accumulated across bags
terminal class nodes are linked from last predicates in each path
bidirectional conflicts are resolved by keeping the stronger direction (ties are randomized)
edge weighting can incorporate zone-level explained variance from PCA (var_exp=True), which constrains the graph structure to reflect both predictive relevance and variance importance of zones

Finally, the graph is summarized through Local Reaching Centrality (LRC), producing a ranked list of influential predicates/zones. Accordngly, the final output is a DataFrame with predicates ranked by their LRC scores, along with their corresponding natural-scale thresholds and zone information. This allows practitioners to identify which spectral zones and thresholds are most influential in the model's decision-making process, providing insights into the underlying spectral features driving predictions. Beyond identifying relevant zones, the predicate's threshold values themselves live in PCA space and are back-projected to the original domain as per-zone multivariate thresholds that can be overlaid on measured spectra, translating an abstract condition into a physically readable boundary. Thus, SMX goes beyond numerical importances by delivering condition-aware, subset-aware explanations that support validation, hypothesis generation, and more actionable domain conclusions.

Model Compatibility Note

At the current stage, SMX is primarily designed for use with scikit-learn-style estimators. In practical terms, this means that when the perturbation-based relevance strategy is employed, the estimator passed to the pipeline is expected to be already fitted and to expose the standard prediction interface required by the selected perturbation metric.

More specifically, the minimum requirement is a valid predict method. In addition, some perturbation metrics require richer interfaces: probability_shift requires predict_proba, while decision_function_shift requires decision_function. Consequently, any model class that follows this contract can be integrated in a technically consistent manner, independently of the specific learning algorithm (for example, SVMs, tree ensembles, linear models, and related scikit-learn-compatible estimators).

Ongoing development is focused on extending this compatibility layer beyond the current scikit-learn-centric workflow, with the objective of supporting additional model ecosystems and API styles in Python while preserving methodological consistency and interpretability guarantees.

Installation and Optional Plotting Dependency

SMX is intentionally distributed with a lightweight core dependency set, where visualization is treated as an optional capability rather than a mandatory runtime requirement. This design ensures that users interested exclusively in methodological analysis (zone extraction, predicate construction, bagging, graph construction, and centrality-based ranking) can install and execute the framework without incurring additional graphical dependencies.

Base installation:

pip install spectral-model-explainer

Installation with plotting support:

pip install "spectral-model-explainer[plotting]"

In practical terms, the plotting extra enables functions that generate interactive visual outputs (for example, threshold-spectrum overlays used to inspect reconstructed multivariate decision boundaries in the natural spectral domain). The analytical SMX pipeline remains fully functional without this extra.

If plotting routines are invoked in an environment where the plotting extra has not been installed, SMX raises an explicit import-related error with installation guidance. This behavior is intentional: it preserves minimal installation overhead for non-visual workflows while providing clear and immediate feedback when visualization features are requested.

Easy Usage

import pandas as pd
from sklearn.svm import SVC
from smx import SMX

# X_cal_prep: preprocessed calibration spectra (DataFrame)
# X_cal_natural: original calibration spectra before preprocessing (DataFrame)
# y_cal_labels: class labels for calibration samples (Series)

spectral_cuts = [
	("F1", 1.0, 100.0),
	("background", 100.0, 200.0, "background_group"),
	("F2", 200.0, 300.0),
]

model = SVC(kernel="rbf", probability=True, random_state=42)
model.fit(X_cal_prep, y_cal_labels)

# Example: probability of the first class as continuous output
y_pred_cal = model.predict_proba(X_cal_prep)[:, 0]

smx = SMX(
	spectral_cuts=spectral_cuts,
	quantiles=[0.25, 0.50, 0.75],
	n_repetitions=4,
	n_bags=10,
	n_samples_fraction=0.8,
	replace=False,
	metric="perturbation",
	estimator=model,
	perturbation_mode="median",
	perturbation_metric="probability_shift",
)

smx.fit(X_cal_prep, y_pred_cal, X_cal_natural=X_cal_natural)

# Main result (ranked predicates with natural-scale thresholds)
results = smx.lrc_natural_
print(results.head())

For a complete, executable walkthrough with synthetic data and visualization outputs, see the quickstart notebook:

examples/quickstart.ipynb

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

May 11, 2026

0.1.3

May 11, 2026

0.1.2

May 11, 2026

This version

0.1.1

Apr 12, 2026

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectral_model_explainer-0.1.1.tar.gz (32.6 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spectral_model_explainer-0.1.1-py3-none-any.whl (37.3 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file spectral_model_explainer-0.1.1.tar.gz.

File metadata

Download URL: spectral_model_explainer-0.1.1.tar.gz
Upload date: Apr 12, 2026
Size: 32.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.13.2 Windows/10

File hashes

Hashes for spectral_model_explainer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2f50d182368a1b4afd32fa79166cb37dc9c150c5e29e1bdd965a7aed44bd8540`
MD5	`bd76b4a569389fa4e3b9d6f400f87355`
BLAKE2b-256	`f64c6c3a18ef7e8b4da451b6e81a51f74791de4f9ffda32b6076f1e1a474360d`

See more details on using hashes here.

File details

Details for the file spectral_model_explainer-0.1.1-py3-none-any.whl.

File metadata

Download URL: spectral_model_explainer-0.1.1-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 37.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.13.2 Windows/10

File hashes

Hashes for spectral_model_explainer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f00943597ac3261f13c84bc2de45fc9f89b603ab28b19e0496392dd99072fea0`
MD5	`8b38c0e38a7fb056294910a6ce06dbfb`
BLAKE2b-256	`c637173ef319c6e19e97df2228bdd9b799bc86c91d1e69356bbe2e16b010a276`

See more details on using hashes here.

spectral-model-explainer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SMX

Method Overview in the Library

Spectral Zone Construction

How zones must be provided

Predicate Construction from Zone Scores

Bagging and Robustness Hyperparameters

Predicate Relevance and Graph Construction

Model Compatibility Note

Installation and Optional Plotting Dependency

Easy Usage

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes