Skip to main content

Pharmacometric modeling workflow CLI

Project description

PKflow

CI PyPI Python License

A composable command-line workflow tool for pharmacometric modeling.

PKflow turns the run → diagnose → compare → report loop of population PK/PD modeling into a handful of scriptable commands. Fit a NONMEM model, collect its results into a tidy, file-based format, and generate goodness-of-fit plots, VPCs, bootstrap confidence intervals, shrinkage tables, η–covariate plots, and a shareable report — all from the terminal or as a Python library.

It is a clean-room Python rewrite of the ideas behind the classic Pirana workbench, with three deliberate design choices:

  • File-based, not a database. Every run is a self-contained directory (results.yaml + parquet sidecars) — diffable, reproducible, and git-friendly.
  • A thin backend protocol. Modeling-engine specifics live behind a small parse / run / collect interface. Today that backend is NONMEM (via pharmpy); the diagnostics, workflows, and report layers are engine-agnostic.
  • Pure functions you can test. The statistics (VPC binning, bootstrap CIs, shrinkage, comparison tables) are pure and unit-tested without needing NONMEM.

Status: early alpha (0.1.0a6). The NONMEM workflow below works end-to-end against a real nmfe binary. APIs may still change.


Table of contents


Install

Requires Python ≥ 3.10.

pip install pkflow

To actually run models you also need:

  • A NONMEM installation with an nmfe script on PATH (or point at it via pkflow config set nmfe <path> — see Configuration).
  • pandoc (system package) — only for report --format html|docx|pdf. Markdown reports and everything else need no extra tooling.
  • A PDF engine — only for report --format pdf. pkflow drives whichever it finds first: weasyprint (pip install weasyprint), wkhtmltopdf, or a LaTeX engine (tectonic / xelatex / pdflatex).

Python dependencies (installed automatically): pharmpy-core, pandas, pyarrow, plotnine, scikit-misc, jinja2, pyyaml, typer, tomli-w.


Quickstart

# 0. Tell pkflow where NONMEM lives (once, stored in ~/.config/pkflow/config.toml)
pkflow config set nmfe /opt/nm760/run/nmfe76

# 1. Fit a model — creates runs/<name>_<timestamp>/
pkflow run model.ctl

# 2..N. Downstream commands default to the LAST successful run, so you can
#       drop the run-dir argument entirely:
pkflow show
pkflow diagnose
pkflow vpc
pkflow shrinkage
pkflow report --format pdf --gof

Every command is independent and operates on a saved run directory, so you can re-run, re-collect, and re-diagnose without re-fitting. Pass an explicit <run_dir> whenever you want a run other than the most recent one:

pkflow show      runs/model_20260609_120000/
pkflow diagnose  runs/model_20260609_120000/

Configuration

pkflow reads configuration from two layered TOML files (last wins):

built-in defaults  <  ~/.config/pkflow/config.toml  <  ./pkflow.toml  <  CLI flags

Manage it with pkflow config — no need to hand-edit files:

pkflow config show                              # resolved values + their source
pkflow config set nmfe /opt/nm760/run/nmfe76    # user-global (default scope)
pkflow config set --project runs_dir output     # project-local ./pkflow.toml
pkflow config get nmfe
pkflow config unset nmfe
pkflow config path                              # where the user file lives
Key Default Meaning
backend nonmem Modeling engine (only NONMEM today)
executor local Local subprocess runner
nmfe nmfe75 Path to your NONMEM nmfe script
runs_dir runs Where run directories are created

Use the user scope for machine-wide settings like the nmfe path, and the project scope (--project, writes ./pkflow.toml) for per-study overrides. Per-invocation flags such as --backend / --runs-dir win over both.


Examples

The examples below use a 2-compartment IV model warfarin.ctl. Replace it with your own control stream — PKflow reads $INPUT, $DATA, parameter blocks, and result files (.lst, .ext, .phi) through pharmpy.

1. Run a model

pkflow run warfarin.ctl
→ runs/warfarin_20260609_120000
status: ok  ofv: 1234.56  (21.9s)

run creates an isolated run directory, copies the dataset in and rewrites $DATA so models with relative data paths just work, executes NONMEM, then collects everything into results.yaml + parquet sidecars (parameters, predictions, η estimates, covariates).

2. Inspect saved results

pkflow show runs/warfarin_20260609_120000/
run      : warfarin_20260609_120000
backend  : nonmem
status   : ok
ofv      : 1234.56
aic/bic  : 1250.56 / 1278.10
cond #   : 18.3

parameters:
     name   type  estimate      se  rse_pct
       CL  theta     0.134  0.0042      3.1
       V1  theta     8.110  0.2100      2.6
        Q  theta     0.220  0.0180      8.2
 OMEGA_1_1  omega     0.091  0.0150     16.4

show reads only the saved files — no NONMEM needed. Use pkflow collect <run_dir> to re-parse the NONMEM output of an existing run without re-fitting.

3. Compare runs

Rank competing models side by side. ΔOFV is relative to the best (lowest) run; failed runs are excluded from the "best" calculation.

pkflow compare runs/base_*/ runs/covCL_*/ runs/covCL_V_*/ --sort ofv --gof
          run_id status     ofv  delta_ofv  n_params     aic     bic  condition_number
covCL_V_20260609     ok  1208.9        0.0         9  1226.9  1236.9              18.3
  covCL_20260609     ok  1210.2        1.3         7  1224.2  1234.2              18.3
   base_20260609     ok  1234.5       25.6         5  1244.5  1254.5              18.3

→ compare/comparison.csv
→ compare/compare_gof.png      # overlaid DV-vs-PRED, colored by run

4. Bootstrap confidence intervals

Nonparametric case-resampling bootstrap: subjects are resampled with replacement (and relabeled to keep duplicates distinct), the model is refit on each replicate, and percentile CIs are reported. Non-converged replicates are excluded and counted.

pkflow bootstrap warfarin.ctl --n 200 --ci 0.95
→ runs/warfarin_20260609_121500  (200 replicates)
converged: 196/200

     name  original_est  boot_median   boot_se   ci_lo   ci_hi  n_success
       CL         0.134        0.135    0.0051   0.125   0.145        196
       V1         8.110        8.090    0.2400   7.640   8.580        196
 OMEGA_1_1         0.091        0.087    0.0190   0.052   0.128        196

→ runs/.../bootstrap/bootstrap_summary.csv

Per-replicate run directories are cleaned up automatically; the summary and the raw per-replicate estimates (replicate_params.parquet) are kept.

5. Goodness-of-fit plots

The standard 4-panel GOF (DV-vs-PRED, DV-vs-IPRED, CWRES-vs-PRED, CWRES-vs-TIME), rendered with plotnine:

pkflow diagnose runs/warfarin_20260609_120000/
  runs/.../diagnostics/dv_vs_pred.png
  runs/.../diagnostics/dv_vs_ipred.png
  runs/.../diagnostics/cwres_vs_pred.png
  runs/.../diagnostics/cwres_vs_time.png
→ 4 plot(s) in runs/.../diagnostics

GOF needs a $TABLE with DV PRED IPRED CWRES TIME written to an sdtab-style file so PKflow can find it.

6. Visual Predictive Check (VPC)

PKflow converts the fitted model to a simulation ($SIMULATION with N subproblems), runs it, bins observations by time, and overlays the observed 5/50/95 percentiles on the simulated prediction intervals.

pkflow vpc runs/warfarin_20260609_120000/ --n-sim 500 --n-bins 10
→ runs/.../diagnostics/vpc.png    (+ vpc.csv with the binned percentiles)

7. η / ε shrinkage

A shrinkage table (flagging values above a threshold, default 30%) plus a faceted histogram of the individual η estimates.

pkflow shrinkage runs/warfarin_20260609_120000/ --threshold 0.30
parameter kind  shrinkage  shrinkage_pct   high
    ETA_1  eta     0.0868          8.68   False
    ETA_2  eta     0.4171         41.71    True
    ETA_3  eta     0.6388         63.88    True

→ runs/.../diagnostics/shrinkage_table.csv
→ runs/.../diagnostics/eta_distributions.png

8. η–covariate plots

Scatter of each η against each subject-level covariate, with a linear trend. Covariates are auto-detected (constant-within-subject, varying across subjects); override with --cov.

# auto-detect covariates
pkflow etacov runs/warfarin_20260609_120000/

# or name them explicitly
pkflow etacov runs/warfarin_20260609_120000/ --cov WT --cov SEX --cov AGE
→ runs/.../diagnostics/eta_covariates.png   (facet grid: η rows × covariate cols)
→ runs/.../diagnostics/eta_covariates.csv

9. Reports (md / html / docx / pdf)

Assemble fit summary, parameter table, shrinkage, any bootstrap result, and the diagnostic plots into one document. Markdown is the canonical render; HTML, Word, and PDF are produced via pandoc.

# Markdown (no extra dependencies)
pkflow report runs/warfarin_20260609_120000/ --format md

# Word document, generating GOF plots first and embedding them
pkflow report runs/warfarin_20260609_120000/ --format docx --gof

# PDF (needs a PDF engine: weasyprint / wkhtmltopdf / a LaTeX engine)
pkflow report --format pdf --gof          # defaults to the last run

# HTML
pkflow report --format html
→ runs/.../report/report.pdf

docx/html/pdf require pandoc on PATH; pdf additionally needs a PDF engine — pkflow auto-detects weasyprint, wkhtmltopdf, tectonic, xelatex, or pdflatex (in that order) and gives a clear, actionable error if none is installed.

10. Use it as a Python library

Everything the CLI does is available as importable functions. The statistics are pure — feed them a Results object (from a saved run or constructed in memory):

from pathlib import Path
from pkflow import backends
from pkflow.executors import LocalExecutor
from pkflow.model import Results
from pkflow.compare import build_table
from pkflow.diagnostics import save_gof, shrinkage_table
from pkflow.workflows import bootstrap

be = backends.get("nonmem")
ex = LocalExecutor({"nmfe": "/opt/nm760/run/nmfe76"})

# parse → run → collect
model  = be.parse(Path("warfarin.ctl"))
handle = be.run(model, Path("runs/wf"), ex)
res    = be.collect(model, Path("runs/wf"), handle)
res.save(Path("runs/wf"))

# load a saved run later
res = Results.load(Path("runs/wf"))

# pure analytics
table = build_table([Results.load(p) for p in Path("runs").glob("*/")])
shr   = shrinkage_table(res, threshold=0.3)
save_gof(res, Path("runs/wf/diagnostics"))

# a full bootstrap workflow
boot = bootstrap(model, res, be, ex, Path("runs/wf"), n=200, seed=1234)
print(boot.summary)

Run directory layout

A run directory is the unit of reproducibility:

runs/warfarin_20260609_120000/
├── results.yaml            # fit metadata: status, ofv, aic/bic, cond#, shrinkage
├── parameters.parquet      # estimates + SE + RSE%
├── predictions.parquet     # $TABLE output (DV/PRED/IPRED/CWRES/...)
├── etas.parquet            # individual η estimates
├── covariates.parquet      # per-subject covariates
├── warfarin.ctl            # the control stream that was run
├── diagnostics/            # GOF, VPC, shrinkage, η-covariate PNGs + CSVs
├── bootstrap/              # bootstrap_summary.csv + replicate_params.parquet
└── report/                 # report.md / .html / .docx

Architecture

pkflow/
├── cli.py              # typer entrypoint — every command is a thin wrapper
├── config.py           # pkflow.toml loader
├── compare.py          # cross-run table + overlaid GOF (pure functions)
├── model/
│   ├── base.py         # backend-agnostic Model
│   └── results.py      # unified Results + save/load (yaml + parquet)
├── backends/
│   ├── base.py         # Backend protocol: parse / run / collect / simulate
│   └── nonmem.py       # pharmpy-backed NONMEM implementation
├── executors/
│   └── local.py        # local subprocess runner
├── diagnostics/
│   ├── gof.py          # 4-panel goodness-of-fit
│   ├── vpc.py          # backend-agnostic VPC (compute + plot)
│   └── shrinkage.py    # shrinkage table, η distributions, η-covariate plots
├── workflows/
│   └── bootstrap.py    # case-resampling bootstrap (pure stats + orchestrator)
└── report/
    ├── render.py       # context builder + Jinja2 markdown + pandoc convert
    └── templates/      # run_report.md.j2

Extending it is meant to be small:

  • A new backend (e.g. another estimation engine) = one file implementing parse / run / collect.
  • A new executor (e.g. Slurm/SGE) = one file implementing submit / wait.

The diagnostics, comparison, bootstrap, and report layers consume the unified Results object and don't care which engine produced it.


Development

pip install -e ".[dev]"
python -m pytest          # full suite

The test suite covers every module. Pure-function tests (config, results, compare, bootstrap, VPC math, shrinkage, report rendering) run without NONMEM using in-memory Results; NONMEM-dependent paths are exercised with a real .mod template and stubbed/faked boundaries. Pandoc-dependent report tests skip automatically when pandoc is absent.


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details. In short:

  1. Open an issue to discuss bugs or feature ideas before large changes.
  2. Follow test-driven development — add a failing test first, then the implementation. Keep statistics as pure functions where possible.
  3. Run python -m pytest and make sure the suite is green before opening a PR.

Roadmap

  • Categorical-covariate boxplots in η–covariate plots
  • Cluster executors (slurm, sge)
  • Additional report sections and templating hooks

The backend protocol is intentionally general, but the project is focused on NONMEM for now.


Citation

If you use PKflow in your research, please cite it:

@software{zhang_pkflow,
  author  = {Zhang, Yufeng},
  title   = {PKflow: A composable command-line workflow tool for pharmacometric modeling},
  year    = {2026},
  url      = {https://github.com/kinginsun/pkflow}
}

Acknowledgements

PKflow stands on the shoulders of excellent open-source work:

  • pharmpy — NONMEM control-stream parsing and result handling.
  • plotnine — grammar-of-graphics plotting for all diagnostics.
  • pandas, Typer, Jinja2, and pandoc.
  • The original Pirana workbench, whose workflow inspired this rewrite.

Author

Yufeng Zhang School of Pharmacy, The Chinese University of Hong Kong (CUHK) Contact: zhangyf@cuhk.edu.hk

License

Released under the MIT License — see LICENSE.

MIT License

Copyright (c) 2026 Yufeng Zhang

PKflow is an independent Python project and is not affiliated with the original Pirana software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pkflow-0.1.0a6.tar.gz (60.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pkflow-0.1.0a6-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file pkflow-0.1.0a6.tar.gz.

File metadata

  • Download URL: pkflow-0.1.0a6.tar.gz
  • Upload date:
  • Size: 60.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pkflow-0.1.0a6.tar.gz
Algorithm Hash digest
SHA256 4e7857ceafd8d7767c727a192cd28ca4a64511732d85c015894831991c0956d5
MD5 39f1fbd89bfbba8e7b75b9a64585b990
BLAKE2b-256 1fe5b791053feb5ebe7e7d8ba9a384a7a54e6d8838b7006cd3c3036b7684e415

See more details on using hashes here.

Provenance

The following attestation bundles were made for pkflow-0.1.0a6.tar.gz:

Publisher: publish.yml on kinginsun/pkflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pkflow-0.1.0a6-py3-none-any.whl.

File metadata

  • Download URL: pkflow-0.1.0a6-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pkflow-0.1.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 ef462e960605e2b0b22cd3c057aa3197b12b3be8db381c5f2e5fa7d6515c6231
MD5 c8d4bbb6b5ae89eaff4715604deb9722
BLAKE2b-256 2f83ba156a65ae8a8eb97d64fb6cbf3d6c074c8155be7b37df0d09d067b74b0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pkflow-0.1.0a6-py3-none-any.whl:

Publisher: publish.yml on kinginsun/pkflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page