Neoclassical transport solver with CPU/GPU and differentiable JAX workflows

Project description

sfincs_jax

Python versions License

sfincs_jax is a standalone neoclassical transport code for radially local drift-kinetic calculations in stellarator and tokamak geometry. It combines high-fidelity kinetic models, CPU/GPU execution, modern matrix-free numerics, parallel workflows, and optional differentiable solve paths in one codebase.

On the current main branch, the audited reduced example suite runs cleanly on CPU and GPU. A separate production-resolution benchmark tier is now being used for public runtime/memory claims and research-scale workloads; those larger finite-beta/profile-current cases are tracked separately from the fast smoke suite. The default CLI path is tuned for robust explicit solves and practical throughput, while the Python API can opt into differentiable solve paths when gradients matter.

It is designed for:

high-performance runs on CPU/GPU,
research and production transport workflows,
memory-efficient large solves,
end-to-end differentiable workflows.

Installation

Install from PyPI:

pip install sfincs_jax

Install from source:

git clone https://github.com/uwplasma/sfincs_jax.git
cd sfincs_jax
pip install .

After installing from a source checkout, you can run the CLI immediately on the bundled tiny example. The suffix of --out selects the output format: .h5/.hdf5 for the Fortran-compatible HDF5 file, .nc/.netcdf for NetCDF4, and .npz for a fast NumPy archive.

cd sfincs_jax
sfincs_jax write-output \
  --input examples/getting_started/input.namelist \
  --out sfincsOutput.h5 \
  --solver-trace solver_trace.json \
  --geometry-only
sfincs_jax --plot sfincsOutput.h5

This is the fast installation smoke test. It writes sfincsOutput.h5 and then writes a multi-page PDF diagnostics panel next to it as sfincsOutput_summary.pdf. The optional --solver-trace sidecar records the backend, selected solve lane, timing, and output metadata without changing the parity-oriented output file. The same command works for NetCDF and NPZ:

sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.nc --geometry-only
sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.npz --geometry-only
sfincs_jax --plot sfincsOutput.nc

For larger non-differentiable RHSMode=1 production runs, the CLI can leave the matrix-free Krylov path when a measured policy is safer or faster. Audited CPU 3D full-FP systems auto-select sparse-PC GMRES inside a measured size window when it is faster and lower-memory than dense FP. Bounded CPU tokamak electric-field systems auto-select dense LU only in the validated active-size window where it avoids the slow Krylov/strong/sparse-rescue ladder; this is a runtime win with a higher transient memory footprint, so it is CPU-only and size-capped. Measured GPU tokamak full-FP no-Er/Er production-floor rows auto-select structured x-block sparse-PC GMRES when that is the parity-clean, lower-memory route; this avoids the global dense-velocity sparse-pattern setup that dominated earlier production-floor runs. On the non-differentiable full-FP x-block path, per-x/TZ blocks up to size 3000 use exact sparse LU before falling back to ILU; this measured policy removes the remaining production-floor full-trajectory cliff without changing PAS or autodiff paths. For tokamak full-FP Er x-block solves, the same path is now the default on CPU and GPU. For one-species decks, the Fortran-style preconditioner_species = 0 setting is treated as equivalent to the compact per-species x-block because there is no inter-species coupling to preserve. The production-floor CPU Er rows now run in about 1 s logged / 1.7-1.9 s cold external time with about 0.42-0.44 GB RSS instead of multi-GB dense-assembly setup. On an RTX A4000, the same rows are residual-clean at about 23.3 s for DKES trajectories and 11.0 s for full trajectories, with active RSS deltas near 1.1 GB; the full-trajectory row uses right-preconditioned GMRES with a short restart. Large constrained-PAS profile-current decks also auto-select sparse-PC GMRES when the problem size is in the validated production window. Tokamak PAS no-Er production-floor sparse-PC runs use the measured MMD_ATA SuperLU ordering by default; PAS+Er full-trajectory runs use the lower-fill measured MMD_AT_PLUS_A ordering, and the one-species PAS+Er row caps the default sparse-PC GMRES restart at 40 unless the user explicitly overrides it. Both routes factor only the active Nxi_for_x degrees of freedom, which cuts sparse-factor fill while preserving strict Fortran parity on the audited CPU and RTX A4000 GPU rows. On the public no-Er production rows, this lowers the two-species CPU active RSS to about 0.39 GB, cuts the two-species GPU runtime to about 5.2 s, and cuts the one-species Nx=4 GPU runtime from about 28.8 s to about 3.0 s, all with zero Fortran output mismatches. Explicit sparse-host LU and x-block sparse PC remain available:

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out sfincsOutput.h5 \
  --solve-method sparse_host_safe \
  --solver-trace solver_trace.json

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out sfincsOutput.h5 \
  --solve-method xblock_sparse_pc_gmres \
  --solver-trace solver_trace.json

This path keeps the full SFINCS system, builds a conservative sparse pattern, probes only colored structural nonzeros, and first tries host sparse LU. The sparse_host_safe mode falls back only for constrained PAS systems where sparse LU exposes a singular/gauge-sensitive branch; that fallback is labeled as PETSc-compatible minimum-norm output instead of pretending the true residual target was met. RHSMode=1 outputs include linearSolverMethod, linearSolverResidualNorm, linearSolverResidualTarget, linearSolverResidualTargetRatio, linearSolverConverged, linearSolverAccepted, and linearSolverAcceptanceCriterion in the main output file. Sparse-PC runs also write setup/solve/factorization timings and sparse-pattern counters such as linearSolverMatvecs, linearSolverSetupTime, linearSolverSolveTime, linearSolverSparsePCFactorTime, linearSolverSparsePatternNnz, linearSolverSparsePCXBlockPreconditionerXi, and linearSolverSparsePCXBlockAssembledHost. The production benchmark manifest now enforces large research-scale floors: 25 x 51 x 4 x 100 (Ntheta x Nzeta x Nx x Nxi) for 3D cases and 25 x 1 x 4 x 100 for tokamak cases. Public production timing rows target SFINCS Fortran v3 runtimes of at least 10 s; earlier 17 x 21 x 5 x 12 finite-beta/profile-current timings were lower-resolution bring-up checks for this solver lane, not public production baselines.

Runtime and Memory Summary

Runtime and memory comparison against SFINCS Fortran v3

The release benchmark above compares SFINCS Fortran v3, sfincs_jax CPU cold, sfincs_jax CPU warm, sfincs_jax GPU cold, and sfincs_jax GPU warm only for reference-runtime-window rows whose Fortran v3 reference runtime is at least 10 s; the summary JSON records which legacy frozen rows still need a full production-resolution rerun. The left panel shows wall-clock runtime and the right panel shows active solver memory, both on log axes. Fortran memory is process maximum RSS; JAX memory uses profiler RSS deltas over the fixed Python/JAX/XLA runtime baseline, while the full process peak remains in the JSON reports as jax_max_rss_mb. Cold is the first external suite command. Warm runtime uses jax_runtime_s_warm when reports were generated with --jax-repeats >= 2; for the current frozen release reports, it falls back to the CLI jax_logged_elapsed_s field. Cases are ordered by best warm sfincs_jax speedup over the Fortran v3 runtime, so the strongest JAX wins appear first. Regenerate the plot with python examples/publication_figures/generate_fortran_suite_benchmark_summary.py.

Scope note: the full 39-case frozen suite remains the parity and CI smoke audit, but sub-10 s Fortran rows are not shown as public performance comparisons. They must either be rerun from the higher-resolution benchmark tier in benchmarks/production_resolution_inputs_2026-05-04 or remain CI/regression checks only. The production tier enforces 25 x 51 x 4 x 100 3D grids and 25 x 1 x 4 x 100 tokamak grids, including public examples and optional user-supplied production-resolution workloads. The manual GitHub workflow Production Benchmark Inputs validates and uploads the production input tree without running expensive solves; full CPU/GPU/Fortran runtime and memory sweeps should be launched on local, office, or cluster hardware with explicit resource budgets.

Physics in One Page

sfincs_jax solves the radially local, steady, linearized drift-kinetic equation for the non-adiabatic distribution-function perturbation f_s1 on a flux surface. In normalized form the solved kinetic balance is

(parallel streaming + mirror force + E x B drift + magnetic drift
 + energy/pitch-angle drifts - linearized collisions) f_s1 = thermodynamic drives.

The unknown distribution can be coupled to the flux-surface electrostatic potential variation Phi1(theta,zeta) through quasineutrality when requested. The output fluxes, flows, transport matrices, and diagnostics are moments of this solved f_s1. The full equations, normalizations, switches, and source-code mapping are documented in docs/system_equations.rst, docs/physics_models.rst, and docs/method.rst.

Quick Start (CLI)

You can run sfincs_jax from anywhere in your terminal. You do not need to be inside the repository folder.

Run an input file:

sfincs_jax /path/to/input.namelist

Write output explicitly:

sfincs_jax write-output --input /path/to/input.namelist --out /path/to/sfincsOutput.h5

Plot an existing output file:

sfincs_jax --plot /path/to/sfincsOutput.h5

By default this writes /path/to/sfincsOutput_summary.pdf, a multi-page panel with geometry, radial profiles, particle/heat/momentum fluxes, NTV, moments, and transport-matrix diagnostics when those datasets are present. Use sfincs_jax plot-output --input-h5 ... --out custom.pdf to choose a filename.

Override the equilibrium file at the CLI without changing input.namelist:

sfincs_jax write-output \
  --input /path/to/input.namelist \
  --out /path/to/sfincsOutput.h5 \
  --wout-path /path/to/wout.nc

The bare sfincs_jax /path/to/input.namelist form accepts the same --equilibrium-file and --wout-path overrides.

Quick Start (Python)

Read a namelist, run sfincs_jax, write an output file, and inspect results directly in memory:

from pathlib import Path

from sfincs_jax.io import write_sfincs_jax_output_h5

input_namelist = Path("input.namelist")
out_path, results = write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    return_results=True,
)

print("Wrote:", out_path)
print("Available datasets:", len(results))
print("Example key:", "particleFlux_vm_psiHat" in results)

Set output_path=Path("sfincsOutput.nc") for NetCDF4 or output_path=Path("sfincsOutput.npz") for a fast NumPy archive. The calculation is identical; only the writer changes.

If you need to override the equilibrium file without editing the namelist, pass equilibrium_file=... or the VMEC-friendly alias wout_path=...:

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    wout_path=Path("/path/to/wout.nc"),
)

sfincs_jax write-output and the scan utilities use the explicit performance-oriented solve path by default. When calling write_sfincs_jax_output_h5(...) directly, pass differentiable=False for the same fast path or request the implicit/differentiable linear-solve path only when you need gradients:

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    differentiable=False,
)

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    solve_method="sparse_host",
    differentiable=False,
)

write_sfincs_jax_output_h5(
    input_namelist=input_namelist,
    output_path=Path("sfincsOutput.h5"),
    differentiable=True,
)

Repository examples that map directly onto common first tasks:

run the bundled tiny CLI example: sfincs_jax examples/getting_started/input.namelist
write a tiny tokamak output: python examples/getting_started/write_sfincs_output_tokamak.py
write a tiny VMEC output with wout_path: python examples/getting_started/write_sfincs_output_vmec.py
run a finite-beta vmec_jax equilibrium into convergence-gated sfincs_jax radial profiles of ambipolar E_r and bootstrap current: python examples/vmec_jax_finite_beta/finite_beta_vmec_to_sfincs.py
plot the finite-beta kinetic/angular/root-bracket convergence scan from cached outputs: python examples/vmec_jax_finite_beta/plot_convergence_scan.py
plot an output file: python examples/getting_started/plot_sfincs_output.py
write HDF5/NetCDF/NPZ and plot a PDF panel: python examples/getting_started/write_and_plot_multiple_formats.py
run autodiff examples: python examples/autodiff/autodiff_gradient_nu_n_residual.py
run the optional VMEC/Boozer differentiable geometry handoff: python examples/autodiff/vmec_jax_to_boozer_sfincs_pipeline.py --wout /path/to/wout.nc
benchmark CPU/GPU parallel solves: python examples/performance/benchmark_sharded_solve_scaling.py --backend cpu --devices 1 2 --inner-warmup-solves 1 --sample-timeout-s 300 ...

Parallel CLI controls are now first-class:

# Multi-core CPU host sharding on one node
sfincs_jax --cores 8 --shard-axis auto /path/to/input.namelist

# Parallel transport-matrix RHS solves
sfincs_jax transport-matrix-v3 \
  --input /path/to/input.namelist \
  --transport-workers 4

# High-nu LHD/W7-X campaign pilot on a dual-GPU node
CUDA_VISIBLE_DEVICES=0,1 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
  --case lhd \
  --collision-operators 0 \
  --nuprime-min 17.78279101649707 \
  --nuprime-max 17.78279101649707 \
  --n-points 1 \
  --transport-workers 2 \
  --transport-parallel-backend gpu \
  --transport-sparse-direct-max 30000 \
  --require-residuals \
  --max-transport-residual 1e-6 \
  --max-transport-relative-residual 1e-6 \
  --scan-only

# The current office dual-GPU LHD pilot for that point is residual-clean in
# ~262 s, compared with ~345 s on one GPU and ~569 s on the older implicit path.
# For the first W7-X FP high-nu point, use the bounded one-worker sparse-LU lane
# below: it closes all three RHS residual gates in ~9.7 min on one office GPU
# with sparse-helper factor reuse, compared with ~33.8 min before reuse.

# W7-X FP high-nu residual-clean pilot, intentionally one worker to limit sparse
# LU memory pressure:
CUDA_VISIBLE_DEVICES=0 \
SFINCS_JAX_TRANSPORT_SPARSE_FACTOR_DTYPE=float32 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
  --case w7x \
  --collision-operators 0 \
  --nuprime-min 17.78332923601508 \
  --nuprime-max 17.78332923601508 \
  --n-points 1 \
  --transport-workers 1 \
  --transport-parallel-backend gpu \
  --transport-sparse-direct-max 40000 \
  --transport-maxiter 800 \
  --require-residuals \
  --max-transport-residual 1e-6 \
  --max-transport-relative-residual 1e-6 \
  --scan-only

# To compare candidate preconditioners before widening W7-X high-nu scans,
# isolate single-RHS behavior:
CUDA_VISIBLE_DEVICES=0 \
python examples/performance/benchmark_w7x_high_nu_preconditioners.py \
  --preconditioners auto,fp_tzfft,xmg \
  --which-rhs 2 \
  --sparse-direct-max 40000 \
  --sparse-factor-dtype float32 \
  --maxiter 800 \
  --timeout-s 900

W7-X high-nu sparse-helper factor reuse

The W7-X high-nu figure is generated by python examples/publication_figures/generate_w7x_high_nu_performance.py. The checked run preserves the previous residual-clean transport matrix exactly, reduces the full one-point wall time from about 2028 s to 582 s, and lowers measured peak RSS from about 19.9 GB to 15.3 GB.

# One-node multi-GPU sharded solve (experimental for very large single-RHS cases)
CUDA_VISIBLE_DEVICES=0,1 \
sfincs_jax write-output \
  --input /path/to/input.namelist \
  --shard-axis theta \
  --distributed-gmres auto

# Multi-host JAX distributed bootstrap
sfincs_jax write-output \
  --input /path/to/input.namelist \
  --distributed \
  --process-count 8 \
  --process-id ${RANK} \
  --coordinator-address node0 \
  --coordinator-port 1234

Use -v to have the executable print the active parallel runtime summary (cores, shard axis, transport workers, distributed Krylov mode, and multi-host bootstrap fields) before the solve starts.

Current recommendation:

CPU host sharding is supported and deterministic, but the measured speedup is still case-dependent.
The current sharded RHSMode=1 CPU path uses a wider Schwarz patch rule plus a bounded multilevel residual correction to avoid the worst 4/8-device fragmentation failures seen in earlier releases.
Use one GPU per case or scan point for production throughput today.
Multi-GPU single-case sharding is available for benchmarking and very large runs, but it remains experimental and is not yet the default recommendation.
The sharded-solve benchmark helper supports both --backend cpu and --backend gpu; the GPU path uses CUDA_VISIBLE_DEVICES and disables JAX preallocation in the subprocess, with cuda_malloc_async enabled for the benchmark subprocess allocator, so one-node GPU scaling experiments are more reproducible.
For practical multi-GPU usage today, the strongest measured path is transport-worker parallelism with one worker per GPU on RHSMode=2/3 runs. On the fresh office 2-GPU rerun of examples/performance/transport_parallel_2min.input.namelist, this path measured 351.1s -> 237.7s from 1 -> 2 GPU workers, i.e. 1.48x speedup on a 3-RHS case, essentially at the finite-task ideal of 1.5x.
Multi-GPU single-case sharding remains experimental. Use it for research and benchmarking, not as the default production scaling path.

You can reproduce the recommended multi-GPU transport-worker benchmark with:

python examples/performance/benchmark_transport_parallel_scaling.py \
  --input examples/performance/transport_parallel_2min.input.namelist \
  --backend gpu \
  --workers 1 2

To regenerate only the checked-in figure from the saved JSON payload without rerunning the multi-minute GPU benchmark:

python examples/performance/benchmark_transport_parallel_scaling.py \
  --from-json examples/performance/output/transport_parallel_scaling_gpu.json \
  --out-dir docs/_static/figures/parallel \
  --figure-name transport_parallel_scaling_gpu.png

GPU transport scaling

Compare two outputs:

sfincs_jax compare-h5 --a sfincsOutput_jax.h5 --b sfincsOutput_fortran.h5

Advanced CLI, plotting, and solver options are documented in docs/usage.rst, docs/outputs.rst, and docs/performance_techniques.rst.

Models, Numerics, and Validation

sfincs_jax solves the same class of neoclassical drift-kinetic problems as mature SFINCS workflows, but it is documented and maintained as its own code. In particular:

the public executable favors bounded, performance-oriented solve strategies,
the Python API can switch to differentiable solve paths when end-to-end sensitivities are needed,
CPU runs lean on JIT-cached kernels and selected host sparse factorizations for hard linear branches,
repeated RHSMode=1 output-writing runs reuse prebuilt grids, geometry, and operator state to cut setup cost on large HSX/geometry11 cases,
GPU runs keep operator applications on device, then fall back to accelerator-safe or host rescue paths only when conditioning or memory demands it,
and the documentation maps the governing equations directly onto the source tree.

The main documentation entry points are:

physics and equations: docs/physics_models.rst, docs/system_equations.rst, docs/physics_reference.rst
geometry and numerics: docs/geometry.rst, docs/method.rst, docs/numerics.rst
inputs and outputs: docs/inputs.rst, docs/outputs.rst
parallel and performance workflows: docs/parallelism.rst, docs/performance.rst
examples, applications, and testing: docs/examples.rst, docs/applications.rst, docs/testing.rst
external trust-building comparisons: docs/fortran_comparison.rst

Current Example-Suite Audit

Regenerate this block from the current main working tree with:

python scripts/run_scaled_example_suite.py \
  --examples-root examples/sfincs_examples \
  --resolution-reference-root /Users/rogeriojorge/local/tests/sfincs_original/fortran/version3/examples \
  --reference-results-root tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix \
  --out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
  --scale-factor 1.0 \
  --runtime-target-basis fortran \
  --fortran-min-runtime-s 0.0 \
  --runtime-adjustment-iters 0 \
  --runtime-baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
  --jax-profile-marks on
python scripts/generate_readme_fast_branch_audit.py \
  --out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
  --gpu-out-root tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas \
  --baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
  --min-fortran-runtime-s 10

The benchmark policy on main is:

start from the original Fortran v3 example resolution,
only downscale when a case is too expensive for a practical suite run,
benchmark JAX CPU and GPU against a frozen CPU-generated Fortran reference root,
and never intentionally push a reduced case below about 1s of Fortran wall time unless the original example is already that small.

That avoids the misleading sub-second Fortran rows that came from blind global downscaling, keeps the GPU lane tied to a deterministic reference, and makes the additional example part of the same artifact set as the standard suite.

Production-resolution inputs are generated separately with scripts/create_production_benchmark_inputs.py. When scripts/run_scaled_example_suite.py is pointed at one of those generated inputs/ trees, it detects the sibling manifest.json and launches only bounded_local_ok rows by default. Use --max-run-recommendation bounded_remote, --max-run-recommendation remote_or_cluster_only, or --max-run-recommendation all only on explicitly budgeted remote or cluster lanes.

Current main CPU audit comes from tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak. Matching main GPU audit comes from tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas.

Recorded cases: 39/39
Practical status counts: parity_ok=39
Strict status counts: parity_ok=39
GPU practical status counts: parity_ok=39
GPU strict status counts: parity_ok=39
CPU output-key coverage: missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0
GPU output-key coverage: missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0
CPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
GPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
Remaining cases: none
Additional example: parity_ok on CPU and parity_ok on GPU

Current mismatches:

CPU practical mismatches: none
CPU strict mismatches: none
GPU practical/strict mismatches: none

Runtime columns match the summary plot: cold is jax_runtime_s; warm/logged is jax_runtime_s_warm when available, otherwise jax_logged_elapsed_s. The JAX memory columns match the plot and use profiler active RSS deltas (jax_incremental_max_rss_mb) when present; full process peak RSS remains available as jax_max_rss_mb in the merged JSON reports. The benchmark summary JSON records production-resolution floor violations for legacy frozen rows, so the table should be read as a reference-runtime-window comparison until every row has been rerun at the current production floor. README-facing runtime/memory rows are restricted to cases where the SFINCS Fortran v3 reference runtime is at least 10 s. Excluded lower-resolution CI parity/smoke rows: HSX_PASCollisions_DKESTrajectories (0.994s), HSX_PASCollisions_fullTrajectories (2.510s), geometryScheme4_1species_PAS_withEr_DKESTrajectories (1.365s), geometryScheme4_2species_PAS_noEr (0.953s), monoenergetic_geometryScheme1 (0.795s), monoenergetic_geometryScheme11 (0.861s), monoenergetic_geometryScheme5_ASCII (1.052s), monoenergetic_geometryScheme5_netCDF (1.029s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_DKESTrajectories (1.104s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_fullTrajectories (1.706s), tokamak_1species_FPCollisions_noEr (7.897s), tokamak_1species_FPCollisions_withEr_DKESTrajectories (6.958s), tokamak_1species_FPCollisions_withEr_fullTrajectories (6.736s), transportMatrix_geometryScheme11 (0.025s), transportMatrix_geometryScheme2 (0.031s).

Full per-case runtime / memory table:

Case	Fortran CPU(s)	JAX CPU cold(s)	CPU cold x	JAX CPU warm/logged(s)	CPU warm/logged x	JAX GPU cold(s)	GPU cold x	JAX GPU warm/logged(s)	GPU warm/logged x	Fortran MB	JAX CPU active MB	CPU MB x	JAX GPU active MB	GPU MB x	CPU mismatch	GPU mismatch	CPU print	GPU print	CPU status	GPU status
`HSX_FPCollisions_DKESTrajectories`	29.664	3.060	0.10x	2.438	0.08x	5.298	0.18x	4.514	0.15x	103.0	303.6	2.95x	370.1	3.59x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`HSX_FPCollisions_fullTrajectories`	88.504	3.054	0.03x	2.414	0.03x	5.247	0.06x	4.493	0.05x	100.8	314.1	3.12x	374.9	3.72x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`additional_examples`	120.074	1.733	0.01x	1.063	0.01x	2.633	0.02x	1.898	0.02x	102.1	237.2	2.32x	336.1	3.29x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`filteredW7XNetCDF_2species_magneticDrifts_noEr`	89.052	2.069	0.02x	1.417	0.02x	2.834	0.03x	2.115	0.02x	103.2	288.3	2.79x	349.3	3.38x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`filteredW7XNetCDF_2species_magneticDrifts_withEr`	95.440	2.011	0.02x	1.372	0.01x	3.339	0.03x	2.590	0.03x	96.2	318.7	3.31x	355.4	3.69x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`filteredW7XNetCDF_2species_noEr`	128.508	1.550	0.01x	0.978	0.01x	2.734	0.02x	1.964	0.02x	100.3	266.9	2.66x	343.8	3.43x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`geometryScheme4_2species_noEr`	139.240	1.733	0.01x	1.112	0.01x	2.888	0.02x	2.105	0.02x	92.2	276.8	3.00x	364.2	3.95x	0/207 (strict 0/207)	0/207 (strict 0/207)	9/9	9/9	parity_ok	parity_ok
`geometryScheme4_2species_noEr_withPhi1InDKE`	293.275	1.936	0.01x	1.353	0.00x	3.340	0.01x	2.596	0.01x	100.6	288.6	2.87x	394.0	3.91x	0/265 (strict 0/265)	0/265 (strict 0/265)	9/9	9/9	parity_ok	parity_ok
`geometryScheme4_2species_noEr_withQN`	146.734	1.769	0.01x	1.140	0.01x	3.132	0.02x	2.402	0.02x	95.1	276.2	2.91x	381.2	4.01x	0/265 (strict 0/265)	0/265 (strict 0/265)	9/9	9/9	parity_ok	parity_ok
`geometryScheme4_2species_withEr_fullTrajectories`	58.053	1.710	0.03x	1.144	0.02x	3.032	0.05x	2.258	0.04x	113.4	284.6	2.51x	359.1	3.17x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`geometryScheme4_2species_withEr_fullTrajectories_withQN`	211.358	1.889	0.01x	1.310	0.01x	3.087	0.01x	2.314	0.01x	98.8	295.1	2.99x	384.2	3.89x	0/251 (strict 0/251)	0/251 (strict 0/251)	9/9	9/9	parity_ok	parity_ok
`geometryScheme5_3species_loRes`	98.976	1.615	0.02x	1.074	0.01x	3.691	0.04x	2.908	0.03x	129.6	352.8	2.72x	363.3	2.80x	0/193 (strict 0/193)	0/193 (strict 0/193)	9/9	9/9	parity_ok	parity_ok
`inductiveE_noEr`	166.614	1.644	0.01x	1.039	0.01x	2.785	0.02x	1.992	0.01x	99.2	279.7	2.82x	364.8	3.68x	0/207 (strict 0/207)	0/207 (strict 0/207)	9/9	9/9	parity_ok	parity_ok
`quick_2species_FPCollisions_noEr`	166.945	1.531	0.01x	0.983	0.01x	2.938	0.02x	2.200	0.01x	97.1	269.0	2.77x	363.7	3.74x	0/207 (strict 0/207)	0/207 (strict 0/207)	9/9	9/9	parity_ok	parity_ok
`sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories`	76.666	1.653	0.02x	1.110	0.01x	3.188	0.04x	2.391	0.03x	106.7	294.2	2.76x	367.6	3.44x	0/207 (strict 0/207)	0/207 (strict 0/207)	9/9	9/9	parity_ok	parity_ok
`sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories`	93.439	1.767	0.02x	1.177	0.01x	3.138	0.03x	2.363	0.03x	94.0	303.6	3.23x	372.2	3.96x	0/207 (strict 0/207)	0/207 (strict 0/207)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_FPCollisions_noEr_withPhi1InDKE`	41.132	13.276	0.32x	12.527	0.30x	16.744	0.41x	16.744	0.41x	169.1	796.8	4.71x	613.9	3.63x	0/274 (strict 0/274)	0/274 (strict 0/274)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_FPCollisions_noEr_withQN`	10.952	9.019	0.82x	8.255	0.75x	7.175	0.66x	7.175	0.66x	159.4	800.0	5.02x	469.1	2.94x	0/274 (strict 0/274)	0/274 (strict 0/274)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_PASCollisions_noEr`	75.566	2.297	0.03x	2.281	0.03x	3.629	0.05x	3.591	0.05x	155.3	336.5	2.17x	1094.1	7.05x	0/212 (strict 0/212)	0/212 (strict 0/212)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_PASCollisions_noEr_Nx1`	75.533	1.339	0.02x	1.330	0.02x	2.728	0.04x	2.718	0.04x	119.2	278.2	2.33x	1076.7	9.03x	0/212 (strict 0/212)	0/212 (strict 0/212)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_PASCollisions_noEr_withQN`	75.242	5.147	0.07x	5.147	0.07x	11.444	0.15x	10.019	0.13x	165.0	612.3	3.71x	459.5	2.78x	0/274 (strict 0/274)	0/274 (strict 0/274)	9/9	9/9	parity_ok	parity_ok
`tokamak_1species_PASCollisions_withEr_fullTrajectories`	75.698	7.049	0.09x	6.231	0.08x	14.423	0.19x	13.193	0.17x	248.9	1319.5	5.30x	1572.1	6.32x	0/212 (strict 0/212)	0/212 (strict 0/212)	8/9	8/9	parity_ok	parity_ok
`tokamak_2species_PASCollisions_noEr`	75.362	2.033	0.03x	2.023	0.03x	5.243	0.07x	5.207	0.07x	215.3	393.5	1.83x	1168.7	5.43x	0/212 (strict 0/212)	0/212 (strict 0/212)	9/9	9/9	parity_ok	parity_ok
`tokamak_2species_PASCollisions_withEr_fullTrajectories`	76.530	9.435	0.12x	8.669	0.11x	23.369	0.31x	22.264	0.29x	386.6	1389.9	3.60x	2007.0	5.19x	0/212 (strict 0/212)	0/212 (strict 0/212)	8/9	8/9	parity_ok	parity_ok

Largest CPU runtime improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:

tokamak_2species_PASCollisions_noEr: 4.0s -> 2.0s (delta=2.0s)
tokamak_1species_PASCollisions_noEr_Nx1: 2.4s -> 1.3s (delta=1.1s)
quick_2species_FPCollisions_noEr: 2.1s -> 1.5s (delta=0.6s)
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories: 2.3s -> 1.8s (delta=0.5s)
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories: 2.2s -> 1.7s (delta=0.5s)

Largest CPU process peak-RSS improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:

tokamak_2species_PASCollisions_noEr: 2088.6 MB -> 584.8 MB (delta=1503.8 MB)
tokamak_1species_PASCollisions_noEr: 612.9 MB -> 527.8 MB (delta=85.1 MB)
tokamak_1species_PASCollisions_noEr_Nx1: 520.3 MB -> 464.3 MB (delta=56.0 MB)
geometryScheme5_3species_loRes: 569.4 MB -> 540.3 MB (delta=29.1 MB)
geometryScheme4_2species_withEr_fullTrajectories_withQN: 512.4 MB -> 486.4 MB (delta=26.0 MB)

Documentation

Build docs locally:

sphinx-build -b html -W docs docs/_build/html

Entry points:

docs/index.rst
docs/system_equations.rst
docs/method.rst
docs/normalizations.rst
docs/performance.rst
docs/parallelism.rst

Testing

pytest -q

License

See LICENSE.

Project details

Release history Release notifications | RSS feed

1.1.6

Jun 10, 2026

1.1.5

Jun 10, 2026

1.1.4

May 16, 2026

1.1.3

May 15, 2026

This version

1.1.2

May 10, 2026

1.1.1

May 10, 2026

1.1.0

Apr 29, 2026

1.0.7

Apr 26, 2026

1.0.6

Apr 26, 2026

1.0.5

Apr 25, 2026

1.0.4

Apr 24, 2026

1.0.3

Apr 24, 2026

1.0.2

Apr 24, 2026

1.0.1

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfincs_jax-1.1.2.tar.gz (26.8 MB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sfincs_jax-1.1.2-py3-none-any.whl (27.1 MB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file sfincs_jax-1.1.2.tar.gz.

File metadata

Download URL: sfincs_jax-1.1.2.tar.gz
Upload date: May 10, 2026
Size: 26.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sfincs_jax-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4c4cef3e9f0632f17d36c02ce0c2467de99cd0bd72bce63949429ead1f02bb26`
MD5	`5922f25dd2035962624b155d6ccf9c7c`
BLAKE2b-256	`202a17edcddbd75af7ba33242beaed48d4f627b57eced9a2306d20f15a6bb8b5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfincs_jax-1.1.2.tar.gz:

Publisher: publish-pypi.yml on uwplasma/sfincs_jax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfincs_jax-1.1.2.tar.gz
- Subject digest: 4c4cef3e9f0632f17d36c02ce0c2467de99cd0bd72bce63949429ead1f02bb26
- Sigstore transparency entry: 1495150742
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: uwplasma/sfincs_jax@5dbe3517400d425010e2ff2aa715fdeb2856ee33
- Branch / Tag: refs/tags/v1.1.2
- Owner: https://github.com/uwplasma
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@5dbe3517400d425010e2ff2aa715fdeb2856ee33
- Trigger Event: push

File details

Details for the file sfincs_jax-1.1.2-py3-none-any.whl.

File metadata

Download URL: sfincs_jax-1.1.2-py3-none-any.whl
Upload date: May 10, 2026
Size: 27.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sfincs_jax-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a08c567849ea4acab9420d671fa2614d7038e83f9ad98f4acd6d2f4d163242e5`
MD5	`d6593c1968197ba77f7dc66884eff4a0`
BLAKE2b-256	`2b224e7eeba6a130b18818403505f4c09f90c6319bae95342aa7c4f453b7824b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sfincs_jax-1.1.2-py3-none-any.whl:

Publisher: publish-pypi.yml on uwplasma/sfincs_jax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sfincs_jax-1.1.2-py3-none-any.whl
- Subject digest: a08c567849ea4acab9420d671fa2614d7038e83f9ad98f4acd6d2f4d163242e5
- Sigstore transparency entry: 1495150861
- Sigstore integration time: May 10, 2026
Source repository:
- Permalink: uwplasma/sfincs_jax@5dbe3517400d425010e2ff2aa715fdeb2856ee33
- Branch / Tag: refs/tags/v1.1.2
- Owner: https://github.com/uwplasma
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@5dbe3517400d425010e2ff2aa715fdeb2856ee33
- Trigger Event: push

sfincs-jax 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

sfincs_jax

Installation

Runtime and Memory Summary

Physics in One Page

Quick Start (CLI)

Quick Start (Python)

Models, Numerics, and Validation

Current Example-Suite Audit

Documentation

Testing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance