Skip to main content

SpectraSherpa — local-first spectroscopy platform for chemometricians

Project description

SpectraSherpa by Spectra Scientific LLC

PyPI Python Platform License: AGPL-3.0 CI Docs

The open chemometrics workbench — visual, reproducible, and local-first.

PCA, PLS, MCR-ALS, SIMCA, PLS-DA, variable selection, and calibration transfer in a drag-and-drop workflow builder — with full version history, provenance you can defend in an audit, and one-click export for supported workflows. Open source and free to run, entirely on your own machine.

📖 Documentation · 🚀 Local onboarding · 🧩 Node Library · 🔬 For developers

Want to try it before installing? Visit the hosted demo at demo.spectrascientific.ai and register with access code welcome_to_spectra_sherpa.

Why SpectraSherpa

A single open workbench for the chemometrics you actually run, built around five strengths:

  • A first-class chemometric toolkit. Preprocessing, decomposition, calibration, classification, variable selection, and calibration transfer are core features, purpose-built for spectroscopy. (Full list below.)
  • Numbers you can defend. PCA and PLS workflows are checked against their underlying SpectroChemPy execution paths, with sklearn parity tests for exported artifact helpers where sklearn is the reference implementation. MCR-ALS is checked against synthetic ground truth and reference workflows.
  • Reproducible by construction. Every workflow is versioned on save and every run is an immutable, provenance-tracked record, so a result always traces back to its exact recipe and data.
  • From exploration to production. Export supported workflows to standalone Python or a Jupyter notebook; trained models are first-class artifacts you can batch-apply and deploy.
  • Open and local-first. AGPL-3.0, reads your instrument files directly (.spg, .spa, .jdx, .opus, .mat, …), and runs entirely on your machine with network egress denied by default in local mode.

Built to be an open foundation that labs and instrument makers can standardize on, extend, and embed. Bundled public benchmark datasets (corn, diesel NIR, NIR shootout, and SpectroChemPy examples) let you reproduce familiar results on day one.

Install & run

pip install "spectra-sherpa[scp,hitran]"
spectra-sherpa

Opens http://localhost:8000 in your browser — no login required. The first launch takes 30–90 s to initialize a local database and caches; later launches are fast.

Other install options — minimal, from source, extras
# Minimal core (no SpectroChemPy examples/readers, no HITRAN downloads)
pip install spectra-sherpa && spectra-sherpa

# From source (contributors)
git clone https://github.com/Spectra-Scientific-LLC/Spectra-Sherpa.git
cd Spectra-Sherpa
pip install poetry
poetry env use python3.11          # supported: 3.11 or 3.12
poetry install --with dev --extras "scp hitran"
poetry run spectra-sherpa
Extra Adds
scp SpectroChemPy algorithms + instrument file readers (.spc, .spa, .spg, .jdx, .opus, …)
hitran HITRAN/HAPI clients for Data → Synthesis live line-table downloads

Requires Python 3.11 or 3.12. Full requirements, CLI flags, and troubleshooting are in 30 Minutes to Local Compute.

HITRAN live downloads require your own HITRAN API key, saved in Settings > API Keys, plus Settings > Integrations > HITRAN/HAPI Queries enabled.

Hosted Demo, Pro, Hybrid, and Organization deployments are operated separately by Spectra Scientific. This README covers the local workstation install.

The chemometric toolkit

Over 60 nodes across the workflow you actually run:

  • Preprocessing — Savitzky-Golay smoothing/derivatives, baseline correction, MSC, SNV, OSC, normalization, scaling.
  • Exploratory — PCA, MCR-ALS, SIMPLISMA, EFA, hierarchical clustering.
  • Calibration & classification — PLS regression, PLS-DA, SIMCA, KNN.
  • Variable selection — iPLS, CARS, SPA, UVE, VIP.
  • Calibration transfer — PDS and direct standardization for instrument-to-instrument transfer.
  • Validation & deployment — cross-validation, nested CV, selection stability, model comparison, batch prediction, deploy-readiness checks.

See the Node Library for node parameters and ports, and Current Capabilities for the supported production scope.

Core concepts

Everything lives inside a Project. Five objects make up a complete analysis — each explained in depth in the Projects, Datasets, and Runs guide.

Object What it is
Project The durable container grouping your datasets, workflows, runs, reports, scripts, and trained models — with versioned snapshots and provenance.
Data The spectra or feature tables you work on. Import instrument files into My Dataset, pull from reference/example datasets, or synthesize FTIR time series.
Workflow The analysis recipe: a drag-and-drop graph (DAG) of nodes, versioned on every save so any result traces back to the exact recipe that produced it.
Run One execution of a workflow — an immutable record of parameters, node status, diagnostics, and any model Artifacts (frozen PCA/PLS/MCR/PLS-DA/KNN/SIMCA models) it produced.
Report A shareable summary assembled from a workflow and its runs. Toggle sections, then export to PDF, HTML, Markdown, or JSON for publication, hand-off, or validation packages.

For Python analysts & chemometricians

SpectraSherpa matches your existing methods rather than replacing them. The internal container is a thin wrapper over a (n_samples, n_features) NumPy array with labeled wavelength and sample axes, so your scikit-learn and pandas code works directly on dataset.data. Bring a working notebook function and make node-scaffold turns it into a toolbar node in minutes.

Start here: Writing a Plugin Node — notebook to node to pull request, no web development required.

Because every step is a typed, provenance-tracked artifact, SpectraSherpa is also a clean foundation for AI assistance — the commercial Sherpa Advisor and Guidance layers build LLM-assisted analysis on top of this deterministic core, which remains fully usable on its own.

Built on the work of others

SpectraSherpa stands on established open science, and keeps citation guidance close to generated outputs:

  • SpectroChemPy — spectroscopic algorithms and instrument-file readers, by Arnaud Travert and Christian Fernandez at the Laboratoire Catalyse et Spectrochimie (LCS), ENSICAEN / Université de Caen / CNRS. Licensed CeCILL-B (BSD-compatible).
  • HITRAN / HAPI — the high-resolution molecular spectroscopic database used by Data → Synthesis to build physically grounded FTIR line tables.
  • Eigenvector Research data sets — recommended NIR/OES chemometrics teaching and validation datasets. SpectraSherpa catalogs these datasets and can download them at runtime when egress is enabled; it does not redistribute the raw Eigenvector data in the wheel.
  • NIST Chemistry WebBook (SRD 69) and the NIST Quantitative Infrared Database (SRD 79) — reference IR spectra for synthesis.

These databases are not owned by Spectra Scientific. Cite NIST, HITRAN, and HAPI in any report, publication, or validation package that uses synthetic datasets — Reference Libraries and Synthesis and the Attributions page list the recommended attributions.

Documentation

Full docs at docs.spectrascientific.ai.

Contributing

We welcome contributions — see CONTRIBUTING.md.

[!IMPORTANT] This project requires a signed Contributor License Agreement (CLA). When you open a PR, a bot comments with instructions; sign by replying: I have read the CLA Document and I hereby sign the CLA

License

Copyright (C) 2026 Spectra Scientific LLC. Licensed under AGPL-3.0 — see LICENSE. If you distribute a modified version (including as a network service), you must release your modifications under the same license. SpectroChemPy is CeCILL-B; see NOTICE.md for full third-party terms. Enterprise features and commercial licensing are available from Spectra Scientific.

[!WARNING] Provided "AS IS" without warranty of any kind. Spectra Scientific LLC disclaims all liability for damages arising from use, including reliance on analytical results. See DISCLAIMER.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectra_sherpa-0.5.2.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spectra_sherpa-0.5.2-py3-none-any.whl (7.7 MB view details)

Uploaded Python 3

File details

Details for the file spectra_sherpa-0.5.2.tar.gz.

File metadata

  • Download URL: spectra_sherpa-0.5.2.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spectra_sherpa-0.5.2.tar.gz
Algorithm Hash digest
SHA256 d001a3a7dead0cb7b175b9e5f8a4b4002251459b96ac6750cf555cb102fbbda5
MD5 24c44afed1fc2f13035ceef5058f423b
BLAKE2b-256 de4169cb41c4979b0d0f8269e845f611f6cdc6ad466286b1ac6a7b9a432fa21c

See more details on using hashes here.

Provenance

The following attestation bundles were made for spectra_sherpa-0.5.2.tar.gz:

Publisher: pypi-release.yml on Spectra-Scientific-LLC/Spectra-Sherpa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spectra_sherpa-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: spectra_sherpa-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spectra_sherpa-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d7fb8acbabc6fd6b8e06ff6c59db51b91f6aebf4cd1eb2fda86089c24df4a8b6
MD5 8af3a258e3d05a42a04ae0757d95b21f
BLAKE2b-256 9d886e14abedad224ee52621cf8f343bdf1aaf7d430766c6e69d6d0ea1c52df3

See more details on using hashes here.

Provenance

The following attestation bundles were made for spectra_sherpa-0.5.2-py3-none-any.whl:

Publisher: pypi-release.yml on Spectra-Scientific-LLC/Spectra-Sherpa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page