Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs
Project description
jps-nextflow-utils
Offline analysis of Nextflow run artifacts - audit, report, and compare workflow runs without network calls or runtime dependencies.
🚀 Overview
jps-nextflow-utils provides a Typer-based CLI for read-only analysis of Nextflow workflow run artifacts. It helps pipeline developers, platform engineers, and QA teams understand what happened during workflow execution by analyzing logs, traces, configs, and other static outputs.
Key Features
- 🔍 Artifact Discovery - Automatically finds and fingerprints Nextflow run artifacts
- 📊 Metadata Extraction - Extracts run details, versions, executors, timings from logs
- ⚠️ Failure Detection - Classifies errors (OOM, timeouts, containers, filesystem, etc.)
- 📈 Performance Analysis - Computes process-level statistics from trace files
- 🔄 Run Comparison - Diff two runs to identify behavioral/performance changes
- 📋 Batch Processing - Audit multiple runs and generate aggregate summaries
- 🎯 Rules Engine - Extensible YAML-based pattern matching for custom checks
- 📝 Multiple Output Formats - JSON, Markdown, text, CSV, NDJSON
Out of Scope
- Triggering or modifying workflow runs
- Network calls (Tower API, cloud APIs)
- Domain-specific scientific validation
📦 Installation
From Source
git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e .
Using Make
make install
🎯 Quick Start
Audit a Single Run
# Basic audit with text output
nf-audit audit run --run-dir /path/to/nextflow/run
# Generate JSON report
nf-audit audit run --run-dir ./my_run --format json --outdir ./reports
# Generate Markdown report
nf-audit audit run -d ./my_run -f md -o ./reports
# Audit from a list of artifact paths
nf-audit audit run --paths artifact_paths.txt --format json
Batch Audit Multiple Runs
# Discover and audit all runs in a directory
nf-audit audit batch --base-dir ./all_runs --glob "*" --outdir ./batch_reports
# Audit specific directories
nf-audit audit batch --run-dir ./run1 --run-dir ./run2 --outdir ./reports
Compare Two Runs
# Diff two runs
nf-audit diff --run-a ./run_baseline --run-b ./run_test
# Save diff report
nf-audit diff -a ./run_baseline -b ./run_test --outdir ./diffs
Work with Rules
# List built-in rules
nf-audit rules list
# List rules from custom pack
nf-audit rules list --rules ./examples/custom_rules.yaml
# Test rules against a run
nf-audit rules test --target ./my_run --rules ./examples/nfcore_rules.yaml
📖 Usage Examples
Example 1: Audit with Custom Rules
nf-audit audit run \
--run-dir /data/pipeline_runs/2024-01-15_rnaseq \
--format json \
--outdir /reports \
--rules ./examples/custom_rules.yaml \
--rules ./examples/nfcore_rules.yaml
Example 2: Batch Processing with CSV Summary
nf-audit audit batch \
--base-dir /data/all_runs \
--glob "run_*" \
--outdir /batch_output \
--summary-format csv
Example 3: Performance Comparison
# Compare before and after optimization
nf-audit diff \
--run-a /runs/before_optimization \
--run-b /runs/after_optimization \
--outdir /comparison
📁 Expected Artifacts
The tool discovers and analyzes these common Nextflow artifacts:
.nextflow.log- Main Nextflow lognextflow.config- Configuration filestrace.txt- Process execution tracereport.html- HTML reporttimeline.html- Timeline visualizatio
Alternative: Specify Artifact Paths
If artifacts are scattered across different locations, create a paths file:
# artifact_paths.txt
/data/logs/.nextflow.log
/archive/traces/trace.txt
/configs/nextflow.config
/reports/report.html
Then audit using:
nf-audit audit run --paths artifact_paths.txt
```n
- `dag.html` / `dag.dot` - Workflow DAG
- `params.json` / `params.yaml` - Parameters
- Additional `*.config` and `*.log` files
## 🔧 Output Formats
### JSON Report Schema
```json
{
"schema_version": "1.0.0",
"tool_version": "0.1.0",
"generated_at": "2024-01-17T10:30:00",
"run_dir": "/path/to/run",
"overall_status": "ERROR",
"metadata": {
"run_name": "angry_euler",
"nextflow_version": "23.04.1",
"executor": "slurm",
"duration_seconds": 3600.5
},
"findings": [...],
"process_rollups": [...],
"inventory": {...}
}
Markdown Report
Human-readable report with:
- Run metadata summary
- Findings by severity
- Process statistics table
- Artifact inventory
- Evidence excerpts
Batch CSV Summary
Aggregate metrics across runs:
- Run status and duration
- Finding counts by severity
- Top failing processes
- Task success/failure rates
🎨 Custom Rules
Create custom YAML rule packs:
version: "1.0"
rules:
- id: custom_error_pattern
category: process_failure
severity: ERROR
description: "Custom error detection"
scope: log
confidence: 0.9
patterns:
- "CUSTOM_ERROR_\\d+"
- "MyPipeline failed"
remediation: |
Check pipeline logs for specific error details.
Contact support if issue persists.
See examples/ for more rule pack examples.
🧪 Development
Setup
# Clone and install with dev dependencies
git clone https://github.com/jai-python3/jps-nextflow-utils.git
cd jps-nextflow-utils
pip install -e ".[dev]"
Testing
# Run tests
make test
# Run with coverage
pytest --cov=src/jps_nextflow_utils tests/
# Lint and format
make fix
make format
make lint
Code Quality
# Format code
black src/ tests/
isort src/ tests/
# Type checking
mypy src/
# Security scan
bandit -r src/
📊 Exit Codes
The CLI uses exit codes to indicate run status:
0- OK (no findings above INFO)1- WARN (warnings present)2- FAIL (errors or fatal findings)≥3- Tool/runtime error
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
See docs/Development_SOP.md for details.
📝 Documentation
🐛 Reporting Issues
Found a bug or have a feature request? Please open an issue on GitHub: https://github.com/jai-python3/jps-nextflow-utils/issues
📜 License
MIT License © Jaideep Sundaram
🙏 Acknowledgments
Designed for the Nextflow and nf-core communities.
Built with 🐍 Python | 🎨 Typer | 📊 Nextflow
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jps_nextflow_utils-0.2.0.tar.gz.
File metadata
- Download URL: jps_nextflow_utils-0.2.0.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
684e154d78566b6896a9419ba26dec3527185ae3fba597d26faa14d38b6f544a
|
|
| MD5 |
2c7b61a4f43122c30396ebd331713c3b
|
|
| BLAKE2b-256 |
6eb7b02774861e5478e521ec64735dfa6b3bed33041691318f6ddc93ee29a21e
|
File details
Details for the file jps_nextflow_utils-0.2.0-py3-none-any.whl.
File metadata
- Download URL: jps_nextflow_utils-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ee814c821924652fcb512a24bd2828cc1548c274ea6a50e0c94944abdad4c9f
|
|
| MD5 |
ab746d487c7402134646a1b3d6a8ea61
|
|
| BLAKE2b-256 |
aa2dc6c9e9a692c8e289793106a8163a4796a43ca9cc8d3808a6f68f05396965
|