Skip to main content

Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs

Project description

jps-nextflow-utils

Build Publish to PyPI codecov

Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs without network calls or runtime dependencies.

🚀 Overview

jps-nextflow-utils provides a Typer-based CLI for read-only analysis of Nextflow workflow run artifacts. It helps pipeline developers, platform engineers, and QA teams understand what happened during workflow execution by analyzing logs, traces, configs, and other static outputs.

Key Features

  • 🔍 Artifact Discovery - Automatically finds and fingerprints Nextflow run artifacts
  • 📊 Metadata Extraction - Extracts run details, versions, executors, timings from logs
  • ⚠️ Failure Detection - Classifies errors (OOM, timeouts, containers, filesystem, etc.)
  • 📈 Performance Analysis - Computes process-level statistics from trace files
  • 🔄 Run Comparison - Diff two runs to identify behavioral/performance changes
  • 📋 Batch Processing - Audit multiple runs and generate aggregate summaries
  • 🎯 Rules Engine - Extensible YAML-based pattern matching for custom checks
  • 📝 Multiple Output Formats - JSON, Markdown, text, CSV, NDJSON

Out of Scope

  • Triggering or modifying workflow runs
  • Network calls (Tower API, cloud APIs)
  • Domain-specific scientific validation

📦 Installation

From Source

git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e .

Using Make

make install

🎯 Quick Start

Audit a Single Run

# Basic audit with text output
nf-audit audit run --run-dir /path/to/nextflow/run

# Generate JSON report
nf-audit audit run --run-dir ./my_run --format json --outdir ./reports

# Generate Markdown report
nf-audit audit run -d ./my_run -f md -o ./reports

# Audit from a list of artifact paths
nf-audit audit run --paths artifact_paths.txt --format json

Batch Audit Multiple Runs

# Discover and audit all runs in a directory
nf-audit audit batch --base-dir ./all_runs --glob "*" --outdir ./batch_reports

# Audit specific directories
nf-audit audit batch --run-dir ./run1 --run-dir ./run2 --outdir ./reports

Compare Two Runs

# Diff two runs
nf-audit diff --run-a ./run_baseline --run-b ./run_test

# Save diff report
nf-audit diff -a ./run_baseline -b ./run_test --outdir ./diffs

Work with Rules

# List built-in rules
nf-audit rules list

# List rules from custom pack
nf-audit rules list --rules ./examples/custom_rules.yaml

# Test rules against a run
nf-audit rules test --target ./my_run --rules ./examples/nfcore_rules.yaml

📖 Usage Examples

Example 1: Audit with Custom Rules

nf-audit audit run \
  --run-dir /data/pipeline_runs/2024-01-15_rnaseq \
  --format json \
  --outdir /reports \
  --rules ./examples/custom_rules.yaml \
  --rules ./examples/nfcore_rules.yaml

Example 2: Batch Processing with CSV Summary

nf-audit audit batch \
  --base-dir /data/all_runs \
  --glob "run_*" \
  --outdir /batch_output \
  --summary-format csv

Example 3: Performance Comparison

# Compare before and after optimization
nf-audit diff \
  --run-a /runs/before_optimization \
  --run-b /runs/after_optimization \
  --outdir /comparison

📁 Expected Artifacts

The tool discovers and analyzes these common Nextflow artifacts:

  • .nextflow.log - Main Nextflow log
  • nextflow.config - Configuration files
  • trace.txt - Process execution trace
  • report.html - HTML report
  • timeline.html - Timeline visualizatio

Alternative: Specify Artifact Paths

If artifacts are scattered across different locations, create a paths file:

# artifact_paths.txt
/data/logs/.nextflow.log
/archive/traces/trace.txt
/configs/nextflow.config
/reports/report.html

Then audit using:

nf-audit audit run --paths artifact_paths.txt
```n
- `dag.html` / `dag.dot` - Workflow DAG
- `params.json` / `params.yaml` - Parameters
- Additional `*.config` and `*.log` files

## 🔧 Output Formats

### JSON Report Schema

```json
{
  "schema_version": "1.0.0",
  "tool_version": "0.1.0",
  "generated_at": "2024-01-17T10:30:00",
  "run_dir": "/path/to/run",
  "overall_status": "ERROR",
  "metadata": {
    "run_name": "angry_euler",
    "nextflow_version": "23.04.1",
    "executor": "slurm",
    "duration_seconds": 3600.5
  },
  "findings": [...],
  "process_rollups": [...],
  "inventory": {...}
}

Markdown Report

Human-readable report with:

  • Run metadata summary
  • Findings by severity
  • Process statistics table
  • Artifact inventory
  • Evidence excerpts

Batch CSV Summary

Aggregate metrics across runs:

  • Run status and duration
  • Finding counts by severity
  • Top failing processes
  • Task success/failure rates

🎨 Custom Rules

Create custom YAML rule packs:

version: "1.0"

rules:
  - id: custom_error_pattern
    category: process_failure
    severity: ERROR
    description: "Custom error detection"
    scope: log
    confidence: 0.9
    patterns:
      - "CUSTOM_ERROR_\\d+"
      - "MyPipeline failed"
    remediation: |
      Check pipeline logs for specific error details.
      Contact support if issue persists.

See examples/ for more rule pack examples.

🧪 Development

Setup

# Clone and install with dev dependencies
git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e ".[dev]"

Testing

# Run tests
make test

# Run with coverage
pytest --cov=src/jps_nextflow_utils tests/

# Lint and format
make fix
make format
make lint

Code Quality

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

# Security scan
bandit -r src/

📊 Exit Codes

The CLI uses exit codes to indicate run status:

  • 0 - OK (no findings above INFO)
  • 1 - WARN (warnings present)
  • 2 - FAIL (errors or fatal findings)
  • ≥3 - Tool/runtime error

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

See docs/Development_SOP.md for details.

📝 Documentation

🐛 Reporting Issues

Found a bug or have a feature request? Please open an issue on GitHub: https://github.com/jai-python3/jps-nextflow-utils/issues

📜 License

MIT License © Jaideep Sundaram

🙏 Acknowledgments

Designed for the Nextflow and nf-core communities.


Built with 🐍 Python | 🎨 Typer | 📊 Nextflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jps_nextflow_utils-0.2.0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jps_nextflow_utils-0.2.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file jps_nextflow_utils-0.2.0.tar.gz.

File metadata

  • Download URL: jps_nextflow_utils-0.2.0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for jps_nextflow_utils-0.2.0.tar.gz
Algorithm Hash digest
SHA256 684e154d78566b6896a9419ba26dec3527185ae3fba597d26faa14d38b6f544a
MD5 2c7b61a4f43122c30396ebd331713c3b
BLAKE2b-256 6eb7b02774861e5478e521ec64735dfa6b3bed33041691318f6ddc93ee29a21e

See more details on using hashes here.

File details

Details for the file jps_nextflow_utils-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jps_nextflow_utils-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee814c821924652fcb512a24bd2828cc1548c274ea6a50e0c94944abdad4c9f
MD5 ab746d487c7402134646a1b3d6a8ea61
BLAKE2b-256 aa2dc6c9e9a692c8e289793106a8163a4796a43ca9cc8d3808a6f68f05396965

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page