Skip to main content

Neoclassical transport solver with CPU/GPU and differentiable JAX workflows

Project description

sfincs_jax

CI Docs PyPI Coverage Python versions License

sfincs_jax is a standalone neoclassical transport code for radially local drift-kinetic calculations in stellarator and tokamak geometry. It combines high-fidelity kinetic models, CPU/GPU execution, modern matrix-free numerics, parallel workflows, and optional differentiable solve paths in one codebase.

On the current main branch, the audited 39-case example suite runs cleanly on CPU and GPU. Release-facing runtime/memory plots use the canonical reference-runtime-window rows from that suite, while research-scale performance claims use the separate production-resolution benchmark tier; larger finite-beta/profile-current cases remain tracked separately from fast smoke and parity checks. The default CLI path is tuned for robust explicit solves and practical throughput, while the Python API can opt into differentiable solve paths when gradients matter.

Current release claims intentionally exclude production-resolution QI CPU/GPU seed ladders, true differentiable device-QI closure, and single-case multi-GPU strong scaling; those remain bounded or deferred research lanes until their promotion artifacts pass.

It is designed for:

  • high-performance runs on CPU/GPU,
  • research and production transport workflows,
  • memory-efficient large solves,
  • end-to-end differentiable workflows.

Installation

Install from PyPI:

pip install sfincs_jax

Install from source:

git clone https://github.com/uwplasma/sfincs_jax.git
cd sfincs_jax
pip install .

After installing from a source checkout, you can run the CLI immediately on the bundled tiny example. The suffix of --out selects the output format: .h5/.hdf5 for the Fortran-compatible HDF5 file, .nc/.netcdf for NetCDF4, and .npz for a fast NumPy archive.

cd sfincs_jax
sfincs_jax write-output \
  --input examples/getting_started/input.namelist \
  --out sfincsOutput.h5 \
  --solver-trace solver_trace.json \
  --geometry-only
sfincs_jax --plot sfincsOutput.h5

This is the fast installation smoke test. It writes sfincsOutput.h5 and then writes a multi-page PDF diagnostics panel next to it as sfincsOutput_summary.pdf. The optional --solver-trace sidecar records the backend, selected solve lane, timing, and output metadata without changing the parity-oriented output file. The same command works for NetCDF and NPZ:

sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.nc --geometry-only
sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.npz --geometry-only
sfincs_jax --plot sfincsOutput.nc

For larger non-differentiable RHSMode=1 production runs, the CLI can leave the matrix-free Krylov path when a measured policy is safer or faster. Audited CPU 3D full-FP systems auto-select sparse-PC GMRES inside a measured size window when it is faster and lower-memory than dense FP. Bounded CPU tokamak electric-field systems auto-select dense LU only in the validated active-size window where it avoids the slow Krylov/strong/sparse-rescue ladder; this is a runtime win with a higher transient memory footprint, so it is CPU-only and size-capped. Measured GPU tokamak full-FP no-Er/Er production-floor rows auto-select structured x-block sparse-PC GMRES when that is the parity-clean, lower-memory route; this avoids the global dense-velocity sparse-pattern setup that dominated earlier production-floor runs. On the non-differentiable full-FP x-block path, per-x/TZ blocks up to size 30000 use exact sparse LU before falling back to ILU; this measured policy removes the medium-QI and production-floor full-trajectory cliffs without changing PAS or autodiff paths. For explicit large-QI requests that opt into JAX-native x-block Krylov, the documented non-autodiff host fallback can rewrite the solve to the accepted host x-block auto policy before JAX factors are built, keeping the measured side-probe seed plus LGMRES rescue. That is the production escape hatch for robust large RHSMode=1 QI solves today; it is distinct from the still-deferred true differentiable device-QI lane. For tokamak full-FP Er and 3D full-FP full-trajectory x-block solves, the same right-preconditioned path is now the default on CPU and GPU. For one-species decks, the Fortran-style preconditioner_species = 0 setting is treated as equivalent to the compact per-species x-block because there is no inter-species coupling to preserve. The production-floor CPU Er rows now run in about 1 s logged / 1.7-1.9 s cold external time with about 0.42-0.44 GB RSS instead of multi-GB dense-assembly setup. On an RTX A4000, the same rows are residual-clean at about 23.3 s for DKES trajectories and 11.0 s for full trajectories, with active RSS deltas near 1.1 GB; the full-trajectory row uses right-preconditioned GMRES with a short restart. Large constrained-PAS profile-current decks also auto-select sparse-PC GMRES when the problem size is in the validated production window. Tokamak PAS no-Er production-floor sparse-PC runs use the measured MMD_ATA SuperLU ordering by default; PAS+Er full-trajectory runs use the lower-fill measured MMD_AT_PLUS_A ordering, and the one-species PAS+Er row caps the default sparse-PC GMRES restart at 40 unless the user explicitly overrides it. Both routes factor only the active Nxi_for_x degrees of freedom, which cuts sparse-factor fill while preserving strict Fortran parity on the audited CPU and RTX A4000 GPU rows. On the public no-Er production rows, this lowers the two-species CPU active RSS to about 0.39 GB, cuts the two-species GPU runtime to about 5.2 s, and cuts the one-species Nx=4 GPU runtime from about 28.8 s to about 3.0 s, all with zero Fortran output mismatches. Explicit sparse-host LU and x-block sparse PC remain available:

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out sfincsOutput.h5 \
  --solve-method sparse_host_safe \
  --solver-trace solver_trace.json

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out sfincsOutput.h5 \
  --solve-method xblock_sparse_pc_gmres \
  --solver-trace solver_trace.json

This path keeps the full SFINCS system, builds a conservative sparse pattern, probes only colored structural nonzeros, and first tries host sparse LU. The sparse_host_safe mode falls back only for constrained PAS systems where sparse LU exposes a singular/gauge-sensitive branch; that fallback is labeled as PETSc-compatible minimum-norm output instead of pretending the true residual target was met. RHSMode=1 outputs include linearSolverMethod, linearSolverResidualNorm, linearSolverResidualTarget, linearSolverResidualTargetRatio, linearSolverConverged, linearSolverAccepted, and linearSolverAcceptanceCriterion in the main output file. Sparse-PC runs also write setup/solve/factorization timings and sparse-pattern counters such as linearSolverMatvecs, linearSolverSetupTime, linearSolverSolveTime, linearSolverSparsePCFactorTime, linearSolverSparsePatternNnz, linearSolverSparsePCXBlockPreconditionerXi, and linearSolverSparsePCXBlockAssembledHost. The production benchmark manifest now enforces large research-scale floors: 25 x 51 x 4 x 100 (Ntheta x Nzeta x Nx x Nxi) for 3D cases and 25 x 1 x 4 x 100 for tokamak cases. Public production timing rows target SFINCS Fortran v3 runtimes of at least 10 s; earlier 17 x 21 x 5 x 12 finite-beta/profile-current timings were lower-resolution bring-up checks for this solver lane, not public production baselines.

Runtime and Memory Summary

Runtime and memory comparison against SFINCS Fortran v3

The release benchmark above compares SFINCS Fortran v3, sfincs_jax CPU cold, sfincs_jax CPU warm, sfincs_jax GPU cold, and sfincs_jax GPU warm only for reference-runtime-window rows whose Fortran v3 reference runtime is at least 10 s; the summary JSON records which legacy frozen rows still need a full production-resolution rerun. The left panel shows wall-clock runtime and the right panel shows active solver memory, both on log axes. Fortran memory is process maximum RSS; JAX memory uses profiler RSS deltas over the fixed Python/JAX/XLA runtime baseline, while the full process peak remains in the JSON reports as jax_max_rss_mb. Cold is the first external suite command. Warm runtime uses jax_runtime_s_warm when reports were generated with --jax-repeats >= 2; for the current frozen release reports, it falls back to the CLI jax_logged_elapsed_s field. Cases are ordered by best warm sfincs_jax speedup over the Fortran v3 runtime, so the strongest JAX wins appear first. The plot and README runtime/memory table are checked against the same canonical filtered rows emitted by the benchmark-summary generator. Regenerate the plot with python examples/publication_figures/generate_fortran_suite_benchmark_summary.py.

Scope note: the full 39-case frozen suite remains the parity and CI smoke audit, but sub-10 s Fortran rows are not shown as public performance comparisons. They must either be rerun from the higher-resolution benchmark tier in benchmarks/production_resolution_inputs_2026-05-04 or remain CI/regression checks only. The production tier enforces 25 x 51 x 4 x 100 3D grids and 25 x 1 x 4 x 100 tokamak grids, including public examples and optional user-supplied production-resolution workloads. The manual GitHub workflow Production Benchmark Inputs validates and uploads the production input tree without running expensive solves; full CPU/GPU/Fortran runtime and memory sweeps should be launched on local, office, or cluster hardware with explicit resource budgets.

Physics in One Page

sfincs_jax solves the radially local, steady, linearized drift-kinetic equation for the non-adiabatic distribution-function perturbation f_s1 on a flux surface. In normalized form the solved kinetic balance is

(parallel streaming + mirror force + E x B drift + magnetic drift
 + energy/pitch-angle drifts - linearized collisions) f_s1 = thermodynamic drives.

The unknown distribution can be coupled to the flux-surface electrostatic potential variation Phi1(theta,zeta) through quasineutrality when requested. The output fluxes, flows, transport matrices, and diagnostics are moments of this solved f_s1. The full equations, normalizations, switches, and source-code mapping are documented in docs/system_equations.rst, docs/physics_models.rst, and docs/method.rst.

Quick Start (CLI)

You can run sfincs_jax from anywhere in your terminal. You do not need to be inside the repository folder.

Run an input file:

sfincs_jax /path/to/input.namelist

Write output explicitly:

sfincs_jax write-output --input /path/to/input.namelist --out /path/to/sfincsOutput.h5

Plot an existing output file:

sfincs_jax --plot /path/to/sfincsOutput.h5

By default this writes /path/to/sfincsOutput_summary.pdf, a multi-page panel with geometry, radial profiles, particle/heat/momentum fluxes, NTV, moments, and transport-matrix diagnostics when those datasets are present. Use sfincs_jax plot-output --input-h5 ... --out custom.pdf to choose a filename.

Override the equilibrium file at the CLI without changing input.namelist:

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out /path/to/sfincsOutput.h5 \
  --wout-path /path/to/wout.nc

The bare sfincs_jax /path/to/input.namelist form accepts the same --equilibrium-file and --wout-path overrides.

Quick Start (Python)

Read a namelist, run sfincs_jax, write an output file, and inspect results directly in memory:

from pathlib import Path

from sfincs_jax.io import write_sfincs_jax_output_h5

input_namelist = Path("input.namelist")
out_path, results = write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    return_results=True,
)

print("Wrote:", out_path)
print("Available datasets:", len(results))
print("Example key:", "particleFlux_vm_psiHat" in results)

Set output_path=Path("sfincsOutput.nc") for NetCDF4 or output_path=Path("sfincsOutput.npz") for a fast NumPy archive. The calculation is identical; only the writer changes.

If you need to override the equilibrium file without editing the namelist, pass equilibrium_file=... or the VMEC-friendly alias wout_path=...:

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    wout_path=Path("/path/to/wout.nc"),
)

sfincs_jax write-output and the scan utilities use the explicit performance-oriented solve path by default. When calling write_sfincs_jax_output_h5(...) directly, pass differentiable=False for the same fast path or request the implicit/differentiable linear-solve path only when you need gradients:

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    differentiable=False,
)

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    solve_method="sparse_host",
    differentiable=False,
)

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    differentiable=True,
)

Repository examples that map directly onto common first tasks:

  • run the bundled tiny CLI example: sfincs_jax examples/getting_started/input.namelist
  • write a tiny tokamak output: python examples/getting_started/write_sfincs_output_tokamak.py
  • write a tiny VMEC output with wout_path: python examples/getting_started/write_sfincs_output_vmec.py
  • run a finite-beta vmec_jax equilibrium into convergence-gated sfincs_jax radial profiles of ambipolar E_r and bootstrap current: python examples/vmec_jax_finite_beta/finite_beta_vmec_to_sfincs.py
  • plot the finite-beta kinetic/angular/root-bracket convergence scan from cached outputs: python examples/vmec_jax_finite_beta/plot_convergence_scan.py
  • plot an output file: python examples/getting_started/plot_sfincs_output.py
  • write HDF5/NetCDF/NPZ and plot a PDF panel: python examples/getting_started/write_and_plot_multiple_formats.py
  • run autodiff examples: python examples/autodiff/autodiff_gradient_nu_n_residual.py
  • run the optional VMEC/Boozer differentiable geometry handoff: python examples/autodiff/vmec_jax_to_boozer_sfincs_pipeline.py --wout /path/to/wout.nc
  • benchmark CPU/GPU parallel solves: python examples/performance/benchmark_sharded_solve_scaling.py --backend cpu --devices 1 2 --inner-warmup-solves 1 --sample-timeout-s 300 ...

Parallel CLI controls are now first-class:

# Multi-core CPU host sharding on one node
sfincs_jax --cores 8 --shard-axis auto /path/to/input.namelist

# Parallel transport-matrix RHS solves
sfincs_jax transport-matrix-v3 \
  --input /path/to/input.namelist \
  --transport-workers 4

# High-nu LHD/W7-X campaign pilot on a dual-GPU node
CUDA_VISIBLE_DEVICES=0,1 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
  --case lhd \
  --collision-operators 0 \
  --nuprime-min 17.78279101649707 \
  --nuprime-max 17.78279101649707 \
  --n-points 1 \
  --transport-workers 2 \
  --transport-parallel-backend gpu \
  --transport-sparse-direct-max 30000 \
  --require-residuals \
  --max-transport-residual 1e-6 \
  --max-transport-relative-residual 1e-6 \
  --scan-only

# The current office dual-GPU LHD pilot for that point is residual-clean in
# ~262 s, compared with ~345 s on one GPU and ~569 s on the older implicit path.
# For the first W7-X FP high-nu point, use the bounded one-worker sparse-LU lane
# below: it closes all three RHS residual gates in ~9.7 min on one office GPU
# with sparse-helper factor reuse, compared with ~33.8 min before reuse.

# W7-X FP high-nu residual-clean pilot, intentionally one worker to limit sparse
# LU memory pressure:
CUDA_VISIBLE_DEVICES=0 \
SFINCS_JAX_TRANSPORT_SPARSE_FACTOR_DTYPE=float32 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
  --case w7x \
  --collision-operators 0 \
  --nuprime-min 17.78332923601508 \
  --nuprime-max 17.78332923601508 \
  --n-points 1 \
  --transport-workers 1 \
  --transport-parallel-backend gpu \
  --transport-sparse-direct-max 40000 \
  --transport-maxiter 800 \
  --require-residuals \
  --max-transport-residual 1e-6 \
  --max-transport-relative-residual 1e-6 \
  --scan-only

# To compare candidate preconditioners before widening W7-X high-nu scans,
# isolate single-RHS behavior:
CUDA_VISIBLE_DEVICES=0 \
python examples/performance/benchmark_w7x_high_nu_preconditioners.py \
  --preconditioners auto,fp_tzfft,xmg \
  --which-rhs 2 \
  --sparse-direct-max 40000 \
  --sparse-factor-dtype float32 \
  --maxiter 800 \
  --timeout-s 900

W7-X high-nu sparse-helper factor reuse

The W7-X high-nu figure is generated by python examples/publication_figures/generate_w7x_high_nu_performance.py. The checked run preserves the previous residual-clean transport matrix exactly, reduces the full one-point wall time from about 2028 s to 582 s, and lowers measured peak RSS from about 19.9 GB to 15.3 GB.

# One-node multi-GPU sharded solve (experimental for very large single-RHS cases)
CUDA_VISIBLE_DEVICES=0,1 \
sfincs_jax write-output \
  --input /path/to/input.namelist \
  --shard-axis theta \
  --distributed-gmres auto

# Multi-host JAX distributed bootstrap
sfincs_jax write-output \
  --input /path/to/input.namelist \
  --distributed \
  --process-count 8 \
  --process-id ${RANK} \
  --coordinator-address node0 \
  --coordinator-port 1234

Use -v to have the executable print the active parallel runtime summary (cores, shard axis, transport workers, distributed Krylov mode, and multi-host bootstrap fields) before the solve starts.

Current recommendation:

  • CPU host sharding is supported and deterministic, but the measured speedup is still case-dependent.
  • The current sharded RHSMode=1 CPU path uses a wider Schwarz patch rule plus a bounded multilevel residual correction to avoid the worst 4/8-device fragmentation failures seen in earlier releases.
  • Use one GPU per case or scan point for production throughput today.
  • Multi-GPU single-case sharding is available for benchmarking and very large runs, but it remains experimental and is not yet the default recommendation.
  • The sharded-solve benchmark helper supports both --backend cpu and --backend gpu; the GPU path uses CUDA_VISIBLE_DEVICES and disables JAX preallocation in the subprocess, with cuda_malloc_async enabled for the benchmark subprocess allocator, so one-node GPU scaling experiments are more reproducible.
  • For practical multi-GPU usage today, the strongest measured path is transport-worker parallelism with one worker per GPU on RHSMode=2/3 runs. On the fresh office 2-GPU rerun of examples/performance/transport_parallel_2min.input.namelist, this path measured 351.1s -> 237.7s from 1 -> 2 GPU workers, i.e. 1.48x speedup on a 3-RHS case, essentially at the finite-task ideal of 1.5x.
  • Multi-GPU single-case sharding remains experimental. Use it for research and benchmarking, not as the default production scaling path.

You can reproduce the recommended multi-GPU transport-worker benchmark with:

python examples/performance/benchmark_transport_parallel_scaling.py \
  --input examples/performance/transport_parallel_2min.input.namelist \
  --backend gpu \
  --workers 1 2

To regenerate only the checked-in figure from the saved JSON payload without rerunning the multi-minute GPU benchmark:

python examples/performance/benchmark_transport_parallel_scaling.py \
  --from-json examples/performance/output/transport_parallel_scaling_gpu.json \
  --out-dir docs/_static/figures/parallel \
  --figure-name transport_parallel_scaling_gpu.png

GPU transport scaling

Compare two outputs:

sfincs_jax compare-h5 --a sfincsOutput_jax.h5 --b sfincsOutput_fortran.h5

Advanced CLI, plotting, and solver options are documented in docs/usage.rst, docs/outputs.rst, and docs/performance_techniques.rst.

Models, Numerics, and Validation

sfincs_jax solves the same class of neoclassical drift-kinetic problems as mature SFINCS workflows, but it is documented and maintained as its own code. In particular:

  • the public executable favors bounded, performance-oriented solve strategies,
  • the Python API can switch to differentiable solve paths when end-to-end sensitivities are needed,
  • CPU runs lean on JIT-cached kernels and selected host sparse factorizations for hard linear branches,
  • repeated RHSMode=1 output-writing runs reuse prebuilt grids, geometry, and operator state to cut setup cost on large HSX/geometry11 cases,
  • GPU runs keep operator applications on device, then fall back to accelerator-safe or host rescue paths only when conditioning or memory demands it,
  • and the documentation maps the governing equations directly onto the source tree.

The main documentation entry points are:

  • physics and equations: docs/physics_models.rst, docs/system_equations.rst, docs/physics_reference.rst
  • geometry and numerics: docs/geometry.rst, docs/method.rst, docs/numerics.rst
  • inputs and outputs: docs/inputs.rst, docs/outputs.rst
  • parallel and performance workflows: docs/parallelism.rst, docs/performance.rst
  • examples, applications, and testing: docs/examples.rst, docs/applications.rst, docs/testing.rst
  • external trust-building comparisons: docs/fortran_comparison.rst

Current Example-Suite Audit

Regenerate this block from the current main working tree with:

python scripts/run_scaled_example_suite.py \
  --examples-root examples/sfincs_examples \
  --resolution-reference-root /Users/rogeriojorge/local/tests/sfincs_original/fortran/version3/examples \
  --reference-results-root tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix \
  --out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
  --scale-factor 1.0 \
  --runtime-target-basis fortran \
  --fortran-min-runtime-s 0.0 \
  --runtime-adjustment-iters 0 \
  --runtime-baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
  --jax-profile-marks on
python scripts/generate_readme_fast_branch_audit.py \
  --out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
  --gpu-out-root tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas \
  --baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
  --min-fortran-runtime-s 10

The benchmark policy on main is:

  • start from the original Fortran v3 example resolution,
  • only downscale when a case is too expensive for a practical suite run,
  • benchmark JAX CPU and GPU against a frozen CPU-generated Fortran reference root,
  • and never intentionally push a reduced case below about 1s of Fortran wall time unless the original example is already that small.

That avoids the misleading sub-second Fortran rows that came from blind global downscaling, keeps the GPU lane tied to a deterministic reference, and makes the additional example part of the same artifact set as the standard suite.

Production-resolution inputs are generated separately with scripts/create_production_benchmark_inputs.py. When scripts/run_scaled_example_suite.py is pointed at one of those generated inputs/ trees, it detects the sibling manifest.json and launches only bounded_local_ok rows by default. Use --max-run-recommendation bounded_remote, --max-run-recommendation remote_or_cluster_only, or --max-run-recommendation all only on explicitly budgeted remote or cluster lanes.

Current main CPU audit comes from tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak. Matching main GPU audit comes from tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas.

  • Recorded cases: 39/39
  • Practical status counts: parity_ok=39
  • Strict status counts: parity_ok=39
  • GPU practical status counts: parity_ok=39
  • GPU strict status counts: parity_ok=39
  • CPU output-key coverage: missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0
  • GPU output-key coverage: missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0
  • CPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
  • GPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
  • Remaining cases: none
  • Additional example: parity_ok on CPU and parity_ok on GPU

Current mismatches:

  • CPU practical mismatches: none
  • CPU strict mismatches: none
  • GPU practical/strict mismatches: none

Runtime columns match the summary plot: cold is jax_runtime_s; warm/logged is jax_runtime_s_warm when available, otherwise jax_logged_elapsed_s. The JAX memory columns match the plot and use profiler active RSS deltas (jax_incremental_max_rss_mb) when present; full process peak RSS remains available as jax_max_rss_mb in the merged JSON reports. The generator emits canonical filtered rows for the plot and the table consistency gate. The benchmark summary JSON records production-resolution floor violations for legacy frozen rows, so the table should be read as a reference-runtime-window comparison until every row has been rerun at the current production floor. README-facing runtime/memory rows are restricted to cases where the SFINCS Fortran v3 reference runtime is at least 10 s. Excluded lower-resolution CI parity/smoke rows: HSX_PASCollisions_DKESTrajectories (0.994s), HSX_PASCollisions_fullTrajectories (2.510s), geometryScheme4_1species_PAS_withEr_DKESTrajectories (1.365s), geometryScheme4_2species_PAS_noEr (0.953s), monoenergetic_geometryScheme1 (0.795s), monoenergetic_geometryScheme11 (0.861s), monoenergetic_geometryScheme5_ASCII (1.052s), monoenergetic_geometryScheme5_netCDF (1.029s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_DKESTrajectories (1.104s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_fullTrajectories (1.706s), tokamak_1species_FPCollisions_noEr (7.897s), tokamak_1species_FPCollisions_withEr_DKESTrajectories (6.958s), tokamak_1species_FPCollisions_withEr_fullTrajectories (6.736s), transportMatrix_geometryScheme11 (0.025s), transportMatrix_geometryScheme2 (0.031s).

Full per-case runtime / memory table:

Case Fortran CPU(s) JAX CPU cold(s) CPU cold x JAX CPU warm/logged(s) CPU warm/logged x JAX GPU cold(s) GPU cold x JAX GPU warm/logged(s) GPU warm/logged x Fortran MB JAX CPU active MB CPU MB x JAX GPU active MB GPU MB x CPU mismatch GPU mismatch CPU print GPU print CPU status GPU status
HSX_FPCollisions_DKESTrajectories 29.664 3.060 0.10x 2.438 0.08x 5.298 0.18x 4.514 0.15x 103.0 303.6 2.95x 370.1 3.59x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
HSX_FPCollisions_fullTrajectories 88.504 3.054 0.03x 2.414 0.03x 5.247 0.06x 4.493 0.05x 100.8 314.1 3.12x 374.9 3.72x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
additional_examples 120.074 1.733 0.01x 1.063 0.01x 2.633 0.02x 1.898 0.02x 102.1 237.2 2.32x 336.1 3.29x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
filteredW7XNetCDF_2species_magneticDrifts_noEr 89.052 2.069 0.02x 1.417 0.02x 2.834 0.03x 2.115 0.02x 103.2 288.3 2.79x 349.3 3.38x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
filteredW7XNetCDF_2species_magneticDrifts_withEr 95.440 2.011 0.02x 1.372 0.01x 3.339 0.03x 2.590 0.03x 96.2 318.7 3.31x 355.4 3.69x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
filteredW7XNetCDF_2species_noEr 128.508 1.550 0.01x 0.978 0.01x 2.734 0.02x 1.964 0.02x 100.3 266.9 2.66x 343.8 3.43x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
geometryScheme4_2species_noEr 139.240 1.733 0.01x 1.112 0.01x 2.888 0.02x 2.105 0.02x 92.2 276.8 3.00x 364.2 3.95x 0/207 (strict 0/207) 0/207 (strict 0/207) 9/9 9/9 parity_ok parity_ok
geometryScheme4_2species_noEr_withPhi1InDKE 293.275 1.936 0.01x 1.353 0.00x 3.340 0.01x 2.596 0.01x 100.6 288.6 2.87x 394.0 3.91x 0/265 (strict 0/265) 0/265 (strict 0/265) 9/9 9/9 parity_ok parity_ok
geometryScheme4_2species_noEr_withQN 146.734 1.769 0.01x 1.140 0.01x 3.132 0.02x 2.402 0.02x 95.1 276.2 2.91x 381.2 4.01x 0/265 (strict 0/265) 0/265 (strict 0/265) 9/9 9/9 parity_ok parity_ok
geometryScheme4_2species_withEr_fullTrajectories 58.053 1.710 0.03x 1.144 0.02x 3.032 0.05x 2.258 0.04x 113.4 284.6 2.51x 359.1 3.17x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
geometryScheme4_2species_withEr_fullTrajectories_withQN 211.358 1.889 0.01x 1.310 0.01x 3.087 0.01x 2.314 0.01x 98.8 295.1 2.99x 384.2 3.89x 0/251 (strict 0/251) 0/251 (strict 0/251) 9/9 9/9 parity_ok parity_ok
geometryScheme5_3species_loRes 98.976 1.615 0.02x 1.074 0.01x 3.691 0.04x 2.908 0.03x 129.6 352.8 2.72x 363.3 2.80x 0/193 (strict 0/193) 0/193 (strict 0/193) 9/9 9/9 parity_ok parity_ok
inductiveE_noEr 166.614 1.644 0.01x 1.039 0.01x 2.785 0.02x 1.992 0.01x 99.2 279.7 2.82x 364.8 3.68x 0/207 (strict 0/207) 0/207 (strict 0/207) 9/9 9/9 parity_ok parity_ok
quick_2species_FPCollisions_noEr 166.945 1.531 0.01x 0.983 0.01x 2.938 0.02x 2.200 0.01x 97.1 269.0 2.77x 363.7 3.74x 0/207 (strict 0/207) 0/207 (strict 0/207) 9/9 9/9 parity_ok parity_ok
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories 76.666 1.653 0.02x 1.110 0.01x 3.188 0.04x 2.391 0.03x 106.7 294.2 2.76x 367.6 3.44x 0/207 (strict 0/207) 0/207 (strict 0/207) 9/9 9/9 parity_ok parity_ok
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories 93.439 1.767 0.02x 1.177 0.01x 3.138 0.03x 2.363 0.03x 94.0 303.6 3.23x 372.2 3.96x 0/207 (strict 0/207) 0/207 (strict 0/207) 9/9 9/9 parity_ok parity_ok
tokamak_1species_FPCollisions_noEr_withPhi1InDKE 41.132 13.276 0.32x 12.527 0.30x 16.744 0.41x 16.744 0.41x 169.1 796.8 4.71x 613.9 3.63x 0/274 (strict 0/274) 0/274 (strict 0/274) 9/9 9/9 parity_ok parity_ok
tokamak_1species_FPCollisions_noEr_withQN 10.952 9.019 0.82x 8.255 0.75x 7.175 0.66x 7.175 0.66x 159.4 800.0 5.02x 469.1 2.94x 0/274 (strict 0/274) 0/274 (strict 0/274) 9/9 9/9 parity_ok parity_ok
tokamak_1species_PASCollisions_noEr 75.566 2.297 0.03x 2.281 0.03x 3.629 0.05x 3.591 0.05x 155.3 336.5 2.17x 1094.1 7.05x 0/212 (strict 0/212) 0/212 (strict 0/212) 9/9 9/9 parity_ok parity_ok
tokamak_1species_PASCollisions_noEr_Nx1 75.533 1.339 0.02x 1.330 0.02x 2.728 0.04x 2.718 0.04x 119.2 278.2 2.33x 1076.7 9.03x 0/212 (strict 0/212) 0/212 (strict 0/212) 9/9 9/9 parity_ok parity_ok
tokamak_1species_PASCollisions_noEr_withQN 75.242 5.147 0.07x 5.147 0.07x 11.444 0.15x 10.019 0.13x 165.0 612.3 3.71x 459.5 2.78x 0/274 (strict 0/274) 0/274 (strict 0/274) 9/9 9/9 parity_ok parity_ok
tokamak_1species_PASCollisions_withEr_fullTrajectories 75.698 7.049 0.09x 6.231 0.08x 14.423 0.19x 13.193 0.17x 248.9 1319.5 5.30x 1572.1 6.32x 0/212 (strict 0/212) 0/212 (strict 0/212) 8/9 8/9 parity_ok parity_ok
tokamak_2species_PASCollisions_noEr 75.362 2.033 0.03x 2.023 0.03x 5.243 0.07x 5.207 0.07x 215.3 393.5 1.83x 1168.7 5.43x 0/212 (strict 0/212) 0/212 (strict 0/212) 9/9 9/9 parity_ok parity_ok
tokamak_2species_PASCollisions_withEr_fullTrajectories 76.530 9.435 0.12x 8.669 0.11x 23.369 0.31x 22.264 0.29x 386.6 1389.9 3.60x 2007.0 5.19x 0/212 (strict 0/212) 0/212 (strict 0/212) 8/9 8/9 parity_ok parity_ok

Largest CPU runtime improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:

  • tokamak_2species_PASCollisions_noEr: 4.0s -> 2.0s (delta=2.0s)
  • tokamak_1species_PASCollisions_noEr_Nx1: 2.4s -> 1.3s (delta=1.1s)
  • quick_2species_FPCollisions_noEr: 2.1s -> 1.5s (delta=0.6s)
  • sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories: 2.3s -> 1.8s (delta=0.5s)
  • sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories: 2.2s -> 1.7s (delta=0.5s)

Largest CPU process peak-RSS improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:

  • tokamak_2species_PASCollisions_noEr: 2088.6 MB -> 584.8 MB (delta=1503.8 MB)
  • tokamak_1species_PASCollisions_noEr: 612.9 MB -> 527.8 MB (delta=85.1 MB)
  • tokamak_1species_PASCollisions_noEr_Nx1: 520.3 MB -> 464.3 MB (delta=56.0 MB)
  • geometryScheme5_3species_loRes: 569.4 MB -> 540.3 MB (delta=29.1 MB)
  • geometryScheme4_2species_withEr_fullTrajectories_withQN: 512.4 MB -> 486.4 MB (delta=26.0 MB)

Documentation

Build docs locally:

sphinx-build -b html -W docs docs/_build/html

Entry points:

  • docs/index.rst
  • docs/system_equations.rst
  • docs/method.rst
  • docs/normalizations.rst
  • docs/performance.rst
  • docs/parallelism.rst

Testing

pytest -q

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfincs_jax-1.1.4.tar.gz (27.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sfincs_jax-1.1.4-py3-none-any.whl (27.3 MB view details)

Uploaded Python 3

File details

Details for the file sfincs_jax-1.1.4.tar.gz.

File metadata

  • Download URL: sfincs_jax-1.1.4.tar.gz
  • Upload date:
  • Size: 27.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sfincs_jax-1.1.4.tar.gz
Algorithm Hash digest
SHA256 61d92f66bebd19b7ebafbd5765a623491406a60abafa1ecbc796440cbd139099
MD5 01559e2de65537a27edad13f12cab6f9
BLAKE2b-256 0aa114b625fe4244fce59d11846a6312c3fa87b1309dcd575463da23aa3d9193

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfincs_jax-1.1.4.tar.gz:

Publisher: publish-pypi.yml on uwplasma/sfincs_jax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sfincs_jax-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: sfincs_jax-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 27.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sfincs_jax-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d348ab0cbdc5e2277671ee2d2e336aa26a71a7b59a03fe45cd6f5186d8c9ffd8
MD5 3a66ce34cf79ad1a9b2f8dbd1eb2079c
BLAKE2b-256 3b02a6da5ba4185be24500564d1d33800bb0f23762e84396600c3c13416981fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfincs_jax-1.1.4-py3-none-any.whl:

Publisher: publish-pypi.yml on uwplasma/sfincs_jax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page