Skip to main content

Interactive visualization components for mass spectrometry data in Streamlit

Project description

OpenMS-Insight

PyPI version Python 3.9+ Tests

Interactive visualization components for mass spectrometry data in Streamlit, backed by Vue.js.

Features

  • Cross-component selection linking via shared identifiers
  • Memory-efficient preprocessing via subprocess isolation
  • Automatic disk caching with config-based invalidation
  • Cache reconstruction - components can be restored from cache without re-specifying configuration
  • Table component (Tabulator.js) with filtering, sorting, go-to, pagination, CSV export
  • Line plot component (Plotly.js) with highlighting, annotations, zoom
  • Heatmap component (Plotly scattergl) with multi-resolution downsampling for millions of points
  • Sequence view component for peptide visualization with fragment ion matching and auto-zoom

Installation

pip install openms-insight

Quick Start

import streamlit as st
from openms_insight import Table, LinePlot, StateManager

# Create state manager for cross-component linking
state_manager = StateManager()

# Create a table - clicking a row sets the 'item' selection
table = Table(
    cache_id="items_table",
    data_path="items.parquet",
    interactivity={'item': 'item_id'},
    column_definitions=[
        {'field': 'item_id', 'title': 'ID', 'sorter': 'number'},
        {'field': 'name', 'title': 'Name'},
    ],
)
table(state_manager=state_manager)

# Create a linked plot - filters by the selected 'item'
plot = LinePlot(
    cache_id="values_plot",
    data_path="values.parquet",
    filters={'item': 'item_id'},
    x_column='x',
    y_column='y',
)
plot(state_manager=state_manager)

Cross-Component Linking

Components communicate through identifiers using three mechanisms:

  • filters: INPUT - filter this component's data by the selection
  • filter_defaults: INPUT - default value when selection is None
  • interactivity: OUTPUT - set a selection when user clicks
# Master table: no filters, sets 'spectrum' on click
master = Table(
    cache_id="spectra",
    data_path="spectra.parquet",
    interactivity={'spectrum': 'scan_id'},  # Click -> sets spectrum=scan_id
)

# Detail table: filters by 'spectrum', sets 'peak' on click
detail = Table(
    cache_id="peaks",
    data_path="peaks.parquet",
    filters={'spectrum': 'scan_id'},        # Filters where scan_id = selected spectrum
    interactivity={'peak': 'peak_id'},      # Click -> sets peak=peak_id
)

# Plot: filters by 'spectrum', highlights selected 'peak'
plot = LinePlot(
    cache_id="plot",
    data_path="peaks.parquet",
    filters={'spectrum': 'scan_id'},
    interactivity={'peak': 'peak_id'},
    x_column='mass',
    y_column='intensity',
)

# Table with filter defaults - shows unannotated data when no identification selected
annotations = Table(
    cache_id="annotations",
    data_path="annotations.parquet",
    filters={'identification': 'id_idx'},
    filter_defaults={'identification': -1},  # Use -1 when identification is None
)

Components

Table

Interactive table using Tabulator.js with filtering dialogs, sorting, pagination, and CSV export.

Table(
    cache_id="spectra_table",
    data_path="spectra.parquet",
    interactivity={'spectrum': 'scan_id'},
    column_definitions=[
        {'field': 'scan_id', 'title': 'Scan', 'sorter': 'number'},
        {'field': 'rt', 'title': 'RT (min)', 'sorter': 'number', 'hozAlign': 'right',
         'formatter': 'money', 'formatterParams': {'precision': 2, 'symbol': ''}},
        {'field': 'precursor_mz', 'title': 'm/z', 'sorter': 'number'},
    ],
    index_field='scan_id',
    go_to_fields=['scan_id'],
    initial_sort=[{'column': 'scan_id', 'dir': 'asc'}],
    default_row=0,
    pagination=True,
    page_size=100,
)

Key parameters:

  • column_definitions: List of Tabulator column configs (field, title, sorter, formatter, etc.)
  • index_field: Column used as unique row identifier (default: 'id')
  • go_to_fields: Columns available in "Go to" navigation
  • initial_sort: Default sort configuration
  • pagination: Enable pagination for large tables (default: True)
  • page_size: Rows per page (default: 100)

LinePlot

Stick-style line plot using Plotly.js for mass spectra visualization.

LinePlot(
    cache_id="spectrum_plot",
    data_path="peaks.parquet",
    filters={'spectrum': 'scan_id'},
    interactivity={'peak': 'peak_id'},
    x_column='mass',
    y_column='intensity',
    highlight_column='is_annotated',
    annotation_column='ion_label',
    title="MS/MS Spectrum",
    x_label="m/z",
    y_label="Intensity",
    styling={
        'highlightColor': '#E4572E',
        'selectedColor': '#F3A712',
        'unhighlightedColor': 'lightblue',
    },
)

Key parameters:

  • x_column, y_column: Column names for x/y values
  • highlight_column: Boolean/int column indicating which points to highlight
  • annotation_column: Text column for labels on highlighted points
  • styling: Color configuration dict

Heatmap

2D scatter heatmap using Plotly scattergl with multi-resolution downsampling for large datasets (millions of points).

Heatmap(
    cache_id="peaks_heatmap",
    data_path="all_peaks.parquet",
    x_column='retention_time',
    y_column='mass',
    intensity_column='intensity',
    interactivity={'spectrum': 'scan_id', 'peak': 'peak_id'},
    min_points=30000,
    x_bins=400,
    y_bins=50,
    title="Peak Map",
    x_label="Retention Time (min)",
    y_label="m/z",
    colorscale='Portland',
)

Key parameters:

  • x_column, y_column, intensity_column: Column names for axes and color
  • min_points: Target size for downsampling (default: 20000)
  • x_bins, y_bins: Grid resolution for spatial binning
  • colorscale: Plotly colorscale name (default: 'Portland')

SequenceView

Peptide sequence visualization with fragment ion matching. Supports both dynamic (filtered by selection) and static sequences.

# Dynamic: sequence from DataFrame filtered by selection
SequenceView(
    cache_id="peptide_view",
    sequence_data_path="sequences.parquet",  # columns: scan_id, sequence, precursor_charge
    peaks_data_path="peaks.parquet",         # columns: scan_id, peak_id, mass, intensity
    filters={'spectrum': 'scan_id'},
    interactivity={'peak': 'peak_id'},
    deconvolved=False,  # peaks are m/z values, consider charge states
    title="Fragment Coverage",
)

# Static: single sequence with optional peaks
SequenceView(
    cache_id="static_peptide",
    sequence_data=("PEPTIDEK", 2),  # (sequence, charge) tuple
    peaks_data=peaks_df,            # Optional: LazyFrame with mass, intensity columns
    deconvolved=True,               # peaks are neutral masses
)

# Simplest: just a sequence string
SequenceView(
    cache_id="simple_seq",
    sequence_data="PEPTIDEK",  # charge defaults to 1
)

Key parameters:

  • sequence_data: LazyFrame, (sequence, charge) tuple, or sequence string
  • sequence_data_path: Path to parquet with sequence data
  • peaks_data / peaks_data_path: Optional peak data for fragment matching
  • deconvolved: If False (default), peaks are m/z and matching considers charge states
  • annotation_config: Dict with ion_types, tolerance, neutral_losses settings

Features:

  • Automatic fragment ion matching (a/b/c/x/y/z ions)
  • Configurable mass tolerance (ppm or Da)
  • Neutral loss support (-H2O, -NH3)
  • Auto-zoom for short sequences (≤20 amino acids)
  • Fragment coverage statistics
  • Click-to-select peaks with cross-component linking

Shared Component Arguments

All components accept these common arguments:

Argument Type Default Description
cache_id str Required Unique identifier for disk cache
data_path str None Path to parquet file (preferred for memory efficiency)
data pl.LazyFrame None Polars LazyFrame (alternative to data_path)
filters Dict[str, str] None Map identifier -> column for filtering
filter_defaults Dict[str, Any] None Default values when selection is None
interactivity Dict[str, str] None Map identifier -> column for click actions
cache_path str "." Base directory for cache storage
regenerate_cache bool False Force cache regeneration

Memory-Efficient Preprocessing

When working with large datasets (especially heatmaps with millions of points), use data_path instead of data to enable subprocess preprocessing:

# Subprocess preprocessing (recommended for large datasets)
# Memory is fully released after cache creation
heatmap = Heatmap(
    data_path="large_peaks.parquet",  # triggers subprocess
    cache_id="peaks_heatmap",
    ...
)

# In-process preprocessing (for smaller datasets or debugging)
# Memory may be retained by allocator after preprocessing
heatmap = Heatmap(
    data=pl.scan_parquet("large_peaks.parquet"),  # runs in main process
    cache_id="peaks_heatmap",
    ...
)

Why this matters: Memory allocators like mimalloc (used by Polars) retain freed memory for performance. For large datasets, this can cause memory usage to stay high even after preprocessing completes. Running preprocessing in a subprocess guarantees all memory is returned to the OS when the subprocess exits.

Cache Reconstruction

Components can be reconstructed from cache using only cache_id and cache_path. All configuration is restored from the cached manifest:

# First run: create component with data and config
table = Table(
    cache_id="my_table",
    data_path="data.parquet",
    filters={'spectrum': 'scan_id'},
    column_definitions=[...],
    cache_path="./cache",
)

# Subsequent runs: reconstruct from cache only
table = Table(
    cache_id="my_table",
    cache_path="./cache",
)
# All config (filters, column_definitions, etc.) restored from cache

Rendering

All components are callable. Pass a StateManager to enable cross-component linking:

from openms_insight import StateManager

state_manager = StateManager()

table(state_manager=state_manager, height=300)
plot(state_manager=state_manager, height=400)

Development

Building the Vue Component

cd js-component
npm install
npm run build

Development Mode (Hot Reload)

# Terminal 1: Vue dev server
cd js-component
npm run dev

# Terminal 2: Streamlit with dev mode
SVC_DEV_MODE=true SVC_DEV_URL=http://localhost:5173 streamlit run app.py

Running Tests

# Python tests
pip install -e ".[dev]"
pytest tests/ -v

# TypeScript type checking
cd js-component
npm run type-check

Linting and Formatting

# Python
ruff check .
ruff format .

# JavaScript/TypeScript
cd js-component
npm run lint
npm run format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openms_insight-0.1.3.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openms_insight-0.1.3-py3-none-any.whl (4.1 MB view details)

Uploaded Python 3

File details

Details for the file openms_insight-0.1.3.tar.gz.

File metadata

  • Download URL: openms_insight-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openms_insight-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1b83a195c24c22ea02bb21b5b4ba3e4fda8d78b335ca87820427d78802e3c526
MD5 d18b311962c38b908c0460a5b323192f
BLAKE2b-256 a05b5627fb5019ecc3dfe2db9382f1b1fba75a8615e57b62a55c1c139e86b9b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for openms_insight-0.1.3.tar.gz:

Publisher: publish.yml on t0mdavid-m/OpenMS-Insight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openms_insight-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: openms_insight-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openms_insight-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1c295378e8c3be65aa518d47cd17cb0949618e3a66051471640699dd31b74796
MD5 7bb902981ae44b467685bc79a9f91406
BLAKE2b-256 861992b7c6e610982e8cfce755e46716a7b2ec8e652705491e5011ec5a5ab21d

See more details on using hashes here.

Provenance

The following attestation bundles were made for openms_insight-0.1.3-py3-none-any.whl:

Publisher: publish.yml on t0mdavid-m/OpenMS-Insight

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page