Skip to main content

A terminal UI for monitoring Snakemake workflows

Project description

snakesee

Language Python Code style Type checked License

Tests PyPI version PyPI downloads Bioconda

A terminal UI for monitoring Snakemake workflows.

snakesee provides a rich TUI dashboard for passively monitoring Snakemake workflows. It reads directly from the .snakemake/ directory, requiring no special flags or configuration when running Snakemake.

Features

  • Zero configuration - Works on any existing workflow without modification
  • Historical browsing - Navigate through past workflow executions
  • Time estimation - Predicts remaining time from historical data
  • Rich TUI - Vim-style keyboard controls, filtering, and sorting
  • Multiple layouts - Full, compact, and minimal display modes

Why snakesee?

Tool Approach Requirements Status
snakesee Passive (reads .snakemake/) None Active
snkmt Active (logger plugin) --logger snkmt + SQLite Active
Panoptes Active (WMS monitor) --wms-monitor + server Early dev
snakemake-terminal-monitor Passive (reads logs) Requires running workflow Maintained
snk CLI wrapper Workflow installation Active
Built-in --dag/--rulegraph Static visualization Graphviz Built-in

Installation

pip (recommended)

pip install snakesee

pip with logo support

pip install snakesee[logo]

conda / mamba

conda install -c bioconda snakesee

Usage

Watch a workflow in real-time

# In a workflow directory
snakesee watch

# Or specify a path
snakesee watch /path/to/workflow

Get a one-time status snapshot

snakesee status
snakesee status /path/to/workflow

Options

snakesee watch --refresh 5.0      # Refresh every 5 seconds (default: 2.0)
snakesee watch --no-estimate      # Disable time estimation
snakesee status --no-estimate     # Status without ETA

Time Estimation

snakesee predicts remaining workflow time using historical execution data from .snakemake/metadata/. The estimation uses multiple strategies depending on available data:

Estimation Methods

Method When Used Confidence
Weighted Historical data available High (0.5-0.9)
Simple No historical data, some jobs completed Medium (0.3-0.7)
Bootstrap No jobs completed yet Low (0.05)

How It Works

  1. Per-rule timing: Historical execution times are tracked for each rule (e.g., align, sort, index)
  2. Recency weighting: Recent runs are weighted more heavily using exponential decay
  3. Pending rule inference: Assumes remaining jobs follow the same rule distribution as completed jobs
  4. Parallelism adjustment: Estimates concurrent job execution from historical completion rates

ETA Display Formats

Format Meaning
~5m High confidence estimate
3m - 8m Medium confidence, shows range
~10m (rough) Low confidence estimate
~15m (very rough) Very low confidence
unknown Insufficient data

Weighting Strategies

snakesee supports two strategies for weighting historical timing data:

Index-Based Weighting (Default)

Weights runs by how many runs ago they occurred, regardless of actual time elapsed:

  • Most recent run has the highest weight
  • Older runs (by log index) progressively contribute less
  • Default half-life: 10 logs (after 10 runs, weight is halved)

This is ideal for active development where each pipeline run may fix issues:

snakesee watch --weighting-strategy index --half-life-logs 10

Time-Based Weighting

Weights runs by wall-clock time since each run:

  • Recent runs (within the last week) have the highest influence
  • Default half-life: 7 days (after 7 days, a run's weight is halved)

This is better for stable pipelines where old data should naturally age out:

snakesee watch --weighting-strategy time --half-life-days 7

Both strategies help adapt to:

  • Hardware changes (new machine, more cores)
  • Software updates (faster tool versions)
  • Pipeline improvements and bug fixes

Wildcard Conditioning

When enabled, snakesee tracks timing separately for each wildcard value (e.g., sample=A, sample=B). This improves estimates when different inputs have significantly different runtimes.

# Enable via CLI flag
snakesee watch --wildcard-timing

# Or toggle in TUI with 'w' key

When to use: Enable when your workflow processes inputs of varying sizes (e.g., genome samples, dataset batches) and execution times vary significantly between them.

Portable Timing Profiles

Export timing data to share across machines or bootstrap new runs:

# Export profile from current workflow
snakesee profile-export

# Export to a specific file
snakesee profile-export --output timing.json

# Merge with existing profile (combine data)
snakesee profile-export --merge

# View profile contents
snakesee profile-show .snakesee-profile.json

# Use a profile for estimation
snakesee watch --profile timing.json

Profiles are auto-discovered: snakesee searches for .snakesee-profile.json in the workflow directory and parent directories.

Tool-Specific Progress Plugins

snakesee includes plugins that parse tool-specific log files to show real-time progress within running jobs. This is particularly useful for long-running bioinformatics tools.

Built-in plugins:

Tool Progress Detection
BWA Processed reads count
STAR Finished reads count
samtools sort Records processed
samtools index Records indexed
fastp Reads processed/passed
fgbio Records processed

How it works:

  1. When a job is running, snakesee searches for its log file
  2. Plugins detect the tool from rule name or log content
  3. Progress is extracted and displayed in the TUI

Creating custom plugins:

Create a Python file in ~/.snakesee/plugins/ or ~/.config/snakesee/plugins/:

# ~/.snakesee/plugins/my_tool.py
import re
from snakesee.plugins.base import ToolProgress, ToolProgressPlugin

class MyToolPlugin(ToolProgressPlugin):
    @property
    def tool_name(self) -> str:
        return "mytool"

    def can_parse(self, rule_name: str, log_content: str) -> bool:
        return "mytool" in rule_name.lower()

    def parse_progress(self, log_content: str) -> ToolProgress | None:
        # Parse your tool's log format
        match = re.search(r"Processed (\d+) items", log_content)
        if match:
            return ToolProgress(
                items_processed=int(match.group(1)),
                unit="items"
            )
        return None

User plugins are automatically discovered and loaded when snakesee starts.

Entry-point plugins (for package authors):

Third-party packages can register plugins via setuptools entry points. Add to your pyproject.toml:

[project.entry-points."snakesee.plugins"]
my_tool = "my_package.plugins:MyToolPlugin"

Entry-point plugins are discovered automatically when the package is installed.

Enhanced Monitoring with Real-Time Events

For real-time event streaming (instead of log polling), you can enable event-based monitoring:

Snakemake 9.0+ (Logger Plugin)

Install the optional Snakemake logger plugin:

pip install snakemake-logger-plugin-snakesee

Then run Snakemake with the logger:

snakemake --logger snakesee --cores 4

Snakemake 8.x (Log Handler Script)

Use the built-in log handler script:

snakemake --log-handler-script $(snakesee log-handler-path) --cores 4

Note: The log handler script is optimized for local execution where jobs start immediately after submission. For cluster/cloud executors (SLURM, AWS Batch, etc.), jobs shown as "running" may still be queued. For accurate queue tracking on clusters, use Snakemake 9+ with the logger plugin.

Monitoring

In another terminal, monitor with snakesee:

snakesee watch

Benefits of real-time events:

Feature Log Parsing Real-Time Events
Job detection Polling (delayed) Immediate
Start times Approximate (log mtime) Exact timestamp
Durations Calculated from logs Precise from events
Failed jobs Pattern matching Direct notification

Real-time events are optional - snakesee works without them using log parsing, and automatically uses events when available.

Workflow Status Detection

snakesee determines if a workflow is actively running by checking:

  1. Lock files exist in .snakemake/locks/
  2. Incomplete markers exist in .snakemake/incomplete/ (jobs in progress)
  3. Log file was recently modified (within the stale threshold)

If lock files AND incomplete markers exist, the workflow is considered RUNNING regardless of log age. This handles very long-running jobs that don't update the log file.

If lock files exist but no incomplete markers, snakesee falls back to checking log freshness. The stale threshold defaults to 30 minutes (1800 seconds). If the log hasn't been updated within this threshold, the workflow is considered interrupted (INCOMPLETE status).

TUI Keyboard Shortcuts

Key Action
q Quit
? Show help
p Pause/resume auto-refresh
e Toggle time estimation
w Toggle wildcard conditioning
r Force refresh
Ctrl+r Hard refresh (reload historical data)

Refresh Rate (vim-style)

Key Action
h Decrease by 5s (faster)
j Decrease by 0.5s (faster)
k Increase by 0.5s (slower)
l Increase by 5s (slower)
0 Reset to default (1s)
G Set to minimum (0.5s, fastest)

Layout & Navigation

Key Action
Tab Cycle layout (full/compact/minimal)
/ Filter rules by name
n / N Next/previous filter match
Esc Clear filter, return to latest log
[ / ] View older/newer log (1 step)
{ / } View older/newer log (5 steps)
s Cycle sort table
1-4 Sort by column

Development

See CONTRIBUTING.md for development setup and guidelines.

Disclaimer

This codebase was written with the assistance of AI (Claude). All code has been reviewed and tested, but users should evaluate fitness for their use case.

License

MIT License - Copyright (c) 2024 Fulcrum Genomics LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakesee-0.3.0.tar.gz (340.1 kB view details)

Uploaded Source

File details

Details for the file snakesee-0.3.0.tar.gz.

File metadata

  • Download URL: snakesee-0.3.0.tar.gz
  • Upload date:
  • Size: 340.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for snakesee-0.3.0.tar.gz
Algorithm Hash digest
SHA256 aa99fd7885d4512c4b6d8141d099dd728a64835da8aa75aaaa2c8b8922ba80df
MD5 f215e7d4f6b608816838ee8f5730ffa6
BLAKE2b-256 87760324759b4c02510cdcdd31a01fd2de47e49aef87c838057f3f6441693bc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for snakesee-0.3.0.tar.gz:

Publisher: publish.yml on nh13/snakesee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page