Skip to main content

Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs

Project description

jps-nextflow-utils

Build Publish to PyPI codecov

Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs without network calls or runtime dependencies.

🚀 Overview

jps-nextflow-utils provides a Typer-based CLI for read-only analysis of Nextflow workflow run artifacts. It helps pipeline developers, platform engineers, and QA teams understand what happened during workflow execution by analyzing logs, traces, configs, and other static outputs.

Key Features

  • 🔍 Artifact Discovery - Automatically finds and fingerprints Nextflow run artifacts
  • 📊 Metadata Extraction - Extracts run details, versions, executors, timings from logs
  • ⚠️ Failure Detection - Classifies errors (OOM, timeouts, containers, filesystem, etc.)
  • 📈 Performance Analysis - Computes process-level statistics from trace files
  • 🔄 Run Comparison - Diff two runs to identify behavioral/performance changes
  • 📋 Batch Processing - Audit multiple runs and generate aggregate summaries
  • 🎯 Rules Engine - Extensible YAML-based pattern matching for custom checks
  • 📝 Multiple Output Formats - JSON, Markdown, text, CSV, NDJSON

Out of Scope

  • Triggering or modifying workflow runs
  • Network calls (Tower API, cloud APIs)
  • Domain-specific scientific validation

📦 Installation

From Source

git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e .

Using Make

make install

🎯 Quick Start

Audit a Single Run

# Basic audit with text output
nf-audit audit run --run-dir /path/to/nextflow/run

# Generate JSON report
nf-audit audit run --run-dir ./my_run --format json --outdir ./reports

# Generate Markdown report
nf-audit audit run -d ./my_run -f md -o ./reports

Batch Audit Multiple Runs

# Discover and audit all runs in a directory
nf-audit audit batch --base-dir ./all_runs --glob "*" --outdir ./batch_reports

# Audit specific directories
nf-audit audit batch --run-dir ./run1 --run-dir ./run2 --outdir ./reports

Compare Two Runs

# Diff two runs
nf-audit diff --run-a ./run_baseline --run-b ./run_test

# Save diff report
nf-audit diff -a ./run_baseline -b ./run_test --outdir ./diffs

Work with Rules

# List built-in rules
nf-audit rules list

# List rules from custom pack
nf-audit rules list --rules ./examples/custom_rules.yaml

# Test rules against a run
nf-audit rules test --target ./my_run --rules ./examples/nfcore_rules.yaml

📖 Usage Examples

Example 1: Audit with Custom Rules

nf-audit audit run \
  --run-dir /data/pipeline_runs/2024-01-15_rnaseq \
  --format json \
  --outdir /reports \
  --rules ./examples/custom_rules.yaml \
  --rules ./examples/nfcore_rules.yaml

Example 2: Batch Processing with CSV Summary

nf-audit audit batch \
  --base-dir /data/all_runs \
  --glob "run_*" \
  --outdir /batch_output \
  --summary-format csv

Example 3: Performance Comparison

# Compare before and after optimization
nf-audit diff \
  --run-a /runs/before_optimization \
  --run-b /runs/after_optimization \
  --outdir /comparison

📁 Expected Artifacts

The tool discovers and analyzes these common Nextflow artifacts:

  • .nextflow.log - Main Nextflow log
  • nextflow.config - Configuration files
  • trace.txt - Process execution trace
  • report.html - HTML report
  • timeline.html - Timeline visualization
  • dag.html / dag.dot - Workflow DAG
  • params.json / params.yaml - Parameters
  • Additional *.config and *.log files

🔧 Output Formats

JSON Report Schema

{
  "schema_version": "1.0.0",
  "tool_version": "0.1.0",
  "generated_at": "2024-01-17T10:30:00",
  "run_dir": "/path/to/run",
  "overall_status": "ERROR",
  "metadata": {
    "run_name": "angry_euler",
    "nextflow_version": "23.04.1",
    "executor": "slurm",
    "duration_seconds": 3600.5
  },
  "findings": [...],
  "process_rollups": [...],
  "inventory": {...}
}

Markdown Report

Human-readable report with:

  • Run metadata summary
  • Findings by severity
  • Process statistics table
  • Artifact inventory
  • Evidence excerpts

Batch CSV Summary

Aggregate metrics across runs:

  • Run status and duration
  • Finding counts by severity
  • Top failing processes
  • Task success/failure rates

🎨 Custom Rules

Create custom YAML rule packs:

version: "1.0"

rules:
  - id: custom_error_pattern
    category: process_failure
    severity: ERROR
    description: "Custom error detection"
    scope: log
    confidence: 0.9
    patterns:
      - "CUSTOM_ERROR_\\d+"
      - "MyPipeline failed"
    remediation: |
      Check pipeline logs for specific error details.
      Contact support if issue persists.

See examples/ for more rule pack examples.

🧪 Development

Setup

# Clone and install with dev dependencies
git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e ".[dev]"

Testing

# Run tests
make test

# Run with coverage
pytest --cov=src/jps_nextflow_utils tests/

# Lint and format
make fix
make format
make lint

Code Quality

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

# Security scan
bandit -r src/

📊 Exit Codes

The CLI uses exit codes to indicate run status:

  • 0 - OK (no findings above INFO)
  • 1 - WARN (warnings present)
  • 2 - FAIL (errors or fatal findings)
  • ≥3 - Tool/runtime error

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

See docs/Development_SOP.md for details.

📝 Documentation

🐛 Reporting Issues

Found a bug or have a feature request? Please open an issue on GitHub: https://github.com/jai-python3/jps-nextflow-utils/issues

📜 License

MIT License © Jaideep Sundaram

🙏 Acknowledgments

Designed for the Nextflow and nf-core communities.


Built with 🐍 Python | 🎨 Typer | 📊 Nextflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jps_nextflow_utils-0.1.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jps_nextflow_utils-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file jps_nextflow_utils-0.1.0.tar.gz.

File metadata

  • Download URL: jps_nextflow_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for jps_nextflow_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aef17e12581927895cf3828fd211004dd1057532c26d9a4b1a9455986f72194f
MD5 9476f3e844faa05e00e31ebb7890ea17
BLAKE2b-256 7bedd73314c225fb7ad3a1c06588514802aa7ecd1dbe9dcbafb97f1a60893f02

See more details on using hashes here.

File details

Details for the file jps_nextflow_utils-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jps_nextflow_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f8c5bb7022435fa10ce0ba35f6aad85fc9c93d4c12ae9fbf71ea6cb14d6528b
MD5 dc7a4ff0ea704e24bd79c762b360bb11
BLAKE2b-256 d4d049133a059e8941b98ce836d0b9482f9676305f7e63a7825cc00edb004fd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page