Neoclassical transport solver with CPU/GPU and differentiable JAX workflows
Project description
sfincs_jax
sfincs_jax is a standalone neoclassical transport code for radially local
drift-kinetic calculations in stellarator and tokamak geometry. It combines
high-fidelity kinetic models, CPU/GPU execution, modern matrix-free numerics,
parallel workflows, and optional differentiable solve paths in one codebase.
On the current main branch, the audited 39-case example suite runs cleanly on
CPU and GPU. Release-facing runtime/memory plots use the canonical
reference-runtime-window rows from that suite, while research-scale performance
claims use the separate production-resolution benchmark tier; larger
finite-beta/profile-current cases remain tracked separately from fast smoke and
parity checks. The default CLI path is tuned for robust explicit solves and
practical throughput, while the Python API can opt into differentiable solve
paths when gradients matter.
Current release claims intentionally exclude production-resolution QI CPU/GPU seed ladders, true differentiable device-QI closure, and single-case multi-GPU strong scaling; those remain bounded or deferred research lanes until their promotion artifacts pass.
The only README figures that are not yet production-quality claims are labeled
as such in place: the QA/QH bootstrap-current profiles use a same-resolution
13 x 13 x 21 x 5 documentation grid with 15 x 15 x 21 x 5 real-space and
13 x 13 x 25 x 6 velocity-space refinement bars, while the full
25 x 39 x 60 x 7 QA/QH RHSMode=1 production-grid convergence and the
true-device QI/GPU lanes remain explicit validation targets. These reduced-grid
figures are useful workflow and normalization checks, not replacements for the
production-resolution gates.
It is designed for:
- high-performance runs on CPU/GPU,
- research and production transport workflows,
- memory-efficient large solves,
- end-to-end differentiable workflows.
Installation
Install from PyPI:
pip install sfincs_jax
Install from source:
git clone https://github.com/uwplasma/sfincs_jax.git
cd sfincs_jax
pip install .
Large public equilibrium fixtures are not stored in the git clone or wheel. They
are fetched from the sfincs-jax-data-v1 GitHub release on first use and cached
under ~/.cache/sfincs_jax/data/2026-05-26-v1 by default. To prefetch them for
CI, offline work, or reproducible examples, run:
python scripts/fetch_equilibria.py
Set SFINCS_JAX_DATA_DIR=/path/to/cache to choose a different cache root, or set
SFINCS_JAX_OFFLINE=1 to require that the release data is already cached.
After installing, run the CLI on an input file. For most users this is the
entire workflow: provide input.namelist, optionally override the geometry file
with --wout-path, and let the default auto policy choose the fastest
validated solve path for the problem.
cd sfincs_jax
sfincs_jax examples/sfincs_examples/quick_2species_FPCollisions_noEr/input.namelist --out sfincsOutput.h5
sfincs_jax --plot sfincsOutput.h5
This writes sfincsOutput.h5 and then creates a multi-page PDF diagnostics panel
next to it as sfincsOutput_summary.pdf. The same command works for NetCDF and
NPZ by changing the output suffix:
sfincs_jax examples/sfincs_examples/quick_2species_FPCollisions_noEr/input.namelist --out sfincsOutput.nc
sfincs_jax examples/sfincs_examples/quick_2species_FPCollisions_noEr/input.namelist --out sfincsOutput.npz
sfincs_jax --plot sfincsOutput.nc
For VMEC geometry, either put the VMEC wout path in the namelist or pass it at
runtime:
sfincs_jax /path/to/input.namelist --wout-path /path/to/wout.nc --out sfincsOutput.h5
Advanced users can still force solver methods and write solver traces for profiling or reproducibility, but these are not required for normal production runs:
sfincs_jax /path/to/input.namelist \
--out sfincsOutput.h5 \
--solver-trace solver_trace.json \
--solve-method xblock_sparse_pc_gmres
RHSMode=1 outputs include linearSolverMethod, linearSolverResidualNorm,
linearSolverResidualTarget, linearSolverResidualTargetRatio,
linearSolverConverged, linearSolverAccepted, and
linearSolverAcceptanceCriterion in the output file so automatic choices remain
auditable after the run.
For large non-autodiff RHSMode=1 full-Fokker-Planck solves without Phi1, the
default auto policy can select the Fortran-reduced sparse-PC GMRES route. This
uses a simplified global preconditioner matrix, keeps the true residual gate on
the full operator, and avoids the slower structured-CSR research path unless the
user explicitly asks for it.
The production benchmark manifest now enforces large research-scale floors:
25 x 51 x 4 x 100 (Ntheta x Nzeta x Nx x Nxi) for 3D cases and
33 x 1 x 12 x 140 for tokamak cases, with the calibrated RHSMode=1
PAS/no-E_r tokamak floor raised further to 89 x 1 x 24 x 300. Public
production timing rows target SFINCS Fortran v3 runtimes of at least 10 s;
earlier 17 x 21 x 5 x 12 finite-beta/profile-current timings were
lower-resolution bring-up checks for this solver lane, not public production
baselines. The plotting scripts still filter by the measured Fortran runtime,
so any generated row that remains below 10 s is validation evidence, not a
public performance row.
Runtime and Memory Summary
The release benchmark above compares SFINCS Fortran v3, sfincs_jax CPU cold,
sfincs_jax CPU warm, sfincs_jax GPU cold, and sfincs_jax GPU warm only for
reference-runtime-window rows whose Fortran v3 reference runtime is at least
10 s; the summary JSON records which previous frozen rows still need a full
production-resolution rerun.
The left panel shows wall-clock runtime and the right panel shows active solver
memory, both on log axes. Fortran memory is process maximum RSS; JAX memory uses
profiler RSS deltas over the fixed Python/JAX/XLA runtime baseline, while the
full process peak remains in the JSON reports as jax_max_rss_mb. Cold is the
first external suite command. Warm runtime
uses jax_runtime_s_warm when reports were generated with --jax-repeats >= 2;
for the current frozen release reports, it falls back to the CLI
jax_logged_elapsed_s field. Cases are ordered by best warm sfincs_jax speedup
over the Fortran v3 runtime, so the strongest JAX wins appear first. The plot
and README runtime/memory table are checked against the same canonical filtered
rows emitted by the benchmark-summary generator. Regenerate the plot with
python examples/publication_figures/generate_fortran_suite_benchmark_summary.py.
Scope note: the full 39-case frozen suite remains the parity and CI smoke audit,
but sub-10 s Fortran rows are not shown as public performance comparisons. They
must either be rerun from the higher-resolution benchmark tier in
benchmarks/production_resolution_inputs_2026-05-04 or remain CI/regression
checks only. The production tier enforces 25 x 51 x 4 x 100 3D grids and
33 x 1 x 12 x 140 tokamak grids, with 89 x 1 x 24 x 300 for RHSMode=1
PAS/no-E_r tokamak rows, including public examples and optional user-supplied
production-resolution workloads. The manual GitHub workflow
Production Benchmark Inputs validates and uploads the production input tree
without running expensive solves; full CPU/GPU/Fortran runtime and memory sweeps
should be launched on local, office, or cluster hardware with explicit
resource budgets.
Latest floor calibration: the checked-in production manifest uses
89 x 1 x 24 x 300 for RHSMode=1 PAS/no-E_r tokamak rows after the smaller
53 x 1 x 16 x 190 calibration proved too short for public GPU-host benchmark
comparisons. Local CPU/GPU calibration runs were strict- or practical-parity
clean at the new floor for the one-species PAS/no-E_r rows, and the bounded
PAS/with-E_r office GPU shard was strict-clean at 33 x 1 x 12 x 140. The
refreshed 3D monoenergetic geometry-11 production row was strict-clean at the
25 x 51 x 4 x 100 manifest floor using the structured tzfft transport
preconditioner before sparse-direct rescue. Regenerate the README plot only
after the remaining CPU/GPU production rows are collected from the checked-in
manifest floor.
Optimization Lane
sfincs_jax also provides optimization-oriented helpers for adding
neoclassical objectives to stellarator design loops. The recommended workflow is
two-tiered: use fast JAX-native proxy objectives inside the optimizer, then
promote accepted designs to full sfincs_jax electric-field scans and kinetic
validation gates before making physics claims.
python examples/optimization/qa_nfp2_sfincs_jax_objectives.py --objective balanced --steps 120
The QA example supports bootstrap, electron-root, flux-selective, and
balanced presets. If the QA proxy does not provide a strong electron-root
candidate, use the QI/QA NFP screen to pick the next kinetic promotion target:
For a VMEC-backed optimization diagnostic, use the checked
vmec_jax/examples/optimization/QA_optimization.py result and inspect the
finite-iota QA equilibrium before promoting it to kinetic sfincs_jax scans:
python examples/optimization/qa_nfp2_bootstrap_current_comparison.py --vmec-jax-root /path/to/vmec_jax
The checked artifact is generated from the real vmec_jax QA optimization
output, with target aspect ratio 5 and target mean iota 0.41. It plots the VMEC
LCFS, |B|, iota profile, and VMEC J.B/sqrt(B.B) current diagnostic. Pass
--comparison-result-dir to overlay a second vmec_jax result, for example a
QA run with a JDotB or RedlBootstrapMismatch objective. The plotted current
is an equilibrium diagnostic, not a kinetic bootstrap-current claim; accepted
equilibria still need completed sfincs_jax scan-er outputs before using
FSABjHatOverRootFSAB2 in a paper or design decision.
python examples/optimization/screen_qi_electron_root_nfp.py --steps 70
The checked screen recommends QI nfp=2 as the first fallback target, because
QA has only low/mid-resolution positive-root evidence so far and QI electron-root
promotion now has a first bounded kinetic artifact. Both scripts write JSON
provenance plus PNG/PDF plots. The proxy layer is differentiable and
finite-difference checked; this does not make the promoted kinetic scan
differentiable. Accepted designs still need completed sfincs_jax scan-er
outputs before high-fidelity SFINCS kinetic gates can be used for bootstrap
current, ambipolar roots, particle/heat/impurity fluxes, residual convergence,
CPU/GPU agreement, and Fortran v3 comparison when applicable. See
docs/optimization.rst.
Concise real-promotion sequence:
python examples/optimization/qa_nfp2_sfincs_jax_objectives.py --objective balanced --steps 120 --out-dir runs/qa_candidate01/proxy --stem candidate01_proxy
python examples/optimization/launch_sfincs_jax_candidate_scan.py --proxy-summary runs/qa_candidate01/proxy/candidate01_proxy.json --input runs/qa_candidate01/input_r0p50.namelist --out-dir runs/qa_candidate01/scan_cpu/r0p50 --er-min -3 --er-max 3 --n-er 7 --jobs 4
python examples/optimization/run_promotion_evidence_campaign.py --input runs/qa_candidate01/input_r0p50.namelist --out-dir runs/qa_candidate01/evidence_r0p50 --values -3 -2 -1 0 1 2 3 --run-cpu --run-gpu --gpu-device 0 --run-fortran --fortran-exe /path/to/sfincs --jobs 4 --impurity-species-index 2 --target-impurity-flux 0.01
The campaign command writes a JSON plan, launches matching CPU/GPU/Fortran scan
lanes, audits each completed scan, and compares the resulting promotion JSON
files. To inspect commands without launching solves, add --dry-run. Fortran
v3 outputs often do not contain JAX linear-residual datasets, so the campaign
allows missing residuals only for the Fortran lane by default; CPU/GPU JAX lanes
still require residual diagnostics.
Equivalent manual commands are:
JAX_PLATFORM_NAME=cpu sfincs_jax scan-er --input runs/qa_candidate01/input_r0p50.namelist --out-dir runs/qa_candidate01/scan_cpu/r0p50 --values -3 -2 -1 0 1 2 3 --compute-solution --skip-existing --jobs 4
python examples/optimization/evaluate_sfincs_jax_promotion_scan.py --scan-dir runs/qa_candidate01/scan_cpu/r0p50 --out-dir runs/qa_candidate01/audit --stem candidate01_r0p50_cpu --require-electron-root
CUDA_VISIBLE_DEVICES=0 JAX_PLATFORM_NAME=gpu sfincs_jax scan-er --input runs/qa_candidate01/input_r0p50.namelist --out-dir runs/qa_candidate01/scan_gpu/r0p50 --values -3 -2 -1 0 1 2 3 --compute-solution --skip-existing --jobs 1
python examples/optimization/evaluate_sfincs_jax_promotion_scan.py --scan-dir runs/qa_candidate01/scan_gpu/r0p50 --out-dir runs/qa_candidate01/audit --stem candidate01_r0p50_gpu --require-electron-root
python examples/optimization/compare_sfincs_jax_promotion_runs.py --cpu runs/qa_candidate01/audit/candidate01_r0p50_cpu.json --gpu runs/qa_candidate01/audit/candidate01_r0p50_gpu.json --out-dir runs/qa_candidate01/audit --stem candidate01_r0p50_comparison
After running a real sfincs_jax scan-er for an accepted candidate, audit the
high-fidelity promotion gate with:
python examples/optimization/evaluate_sfincs_jax_promotion_scan.py --scan-dir /path/to/scan-er-directory
Pass --impurity-species-index only when the scan includes a real impurity
species and the flux-selectivity objective is part of the claim. For two-species
ion/electron electron-root scans, omit it; the backend comparison still checks
the selected root, bootstrap objective, residuals, and CPU/GPU agreement.
To go from a proxy optimization JSON to a reproducible scan command without starting a long solve immediately:
python examples/optimization/launch_sfincs_jax_candidate_scan.py --proxy-summary qa_nfp2_sfincs_jax_optimization_lane.json --input input.namelist --out-dir candidate_scan
CPU/GPU and optional Fortran-v3 promotion summaries can be compared with:
python examples/optimization/compare_sfincs_jax_promotion_runs.py --cpu cpu_promotion.json --gpu gpu_promotion.json --fortran fortran_promotion.json
The checked documentation includes two real promotion artifacts generated from
separate CPU, GPU, and SFINCS Fortran v3 JSON files: a reduced-W7-X PAS/DKES
two-species comparison for shared-model backend/reference agreement, and a
low-resolution finite-beta QA positive-electron-root comparison. The finite-beta
artifact demonstrates the full promotion workflow on a VMEC finite-beta QA
geometry; production-resolution radial/profile convergence remains a separate
claim-specific validation step.
It also now includes a first bounded QI nfp=2 kinetic electron-root promotion
artifact from a two-species VMEC scan at 7 x 7 x 7 x 4. CPU and GPU pass
strict agreement for the positive ambipolar root at E_r = 2.4386009865; the
SFINCS Fortran v3 reference agrees within the documented low-resolution
reference tolerances. This closes the first QI kinetic artifact, not the
production-resolution QI ladder.
The first refined QI nfp=2 rung at 9 x 9 x 11 x 4 also passes
fixed-resolution CPU/GPU/Fortran gates: CPU selected E_r = 2.2834299271, GPU
selected the same root within 4.3e-14, and the Fortran-v3 reference selected
E_r = 2.2834273232 within the documented refined-grid tolerance. The root
drift from the 7 x 7 x 7 x 4 artifact is about 0.155, so this is
persistence evidence, not a convergence claim.
A second bounded rung at 11 x 11 x 13 x 4 now checks the same CPU/GPU/Fortran
contract after fixing a mid-size RHSMode=1 solver-policy cliff. The default
dense full-FP lane covers active sizes up to 8000; this reduced the checked
11 x 11 x 13 x 4 CPU scan from about 326 s to about 23 s, and the office
GPU0 scan also completes in about 23 s. CPU and GPU selected
E_r = 2.2224054815 within 2.5e-13; Fortran-v3 selected
E_r = 2.2224043880, within the documented 2e-6 reference tolerance. The
root still drifts from the 9 x 9 x 11 x 4 rung, so production-resolution QI
remains an explicit research lane.
The QI ladder also has a 13 x 13 x 15 x 4 fixed-resolution CPU/GPU/Fortran
artifact and a 15 x 15 x 17 x 4 CPU/Fortran rung. The 15x rung selects
E_r = 2.2132389239 on CPU and agrees with SFINCS Fortran v3 to 9.2e-7
relative on the selected root, with all residual gates passing. GPU promotion at
this resolution is intentionally not claimed yet: the next route is the new
matrix-free QI-device operator-reuse path, which now exists behind explicit
advanced controls. A bounded one-GPU rerun verifies route activation and local
x-block factor skipping, but it remains fail-closed because the residual misses
the requested target; it is infrastructure evidence, not a public performance
claim.
A checked no-solve rollup now records the QI nfp=2 electron-root ladder from
7x through 15x. The latest root drift is 0.00210, but the rollup remains
deferred because the 15x GPU rung and the full 25 x 51 x 100 x 4
production floor are still open.
The separate finite-beta QA convergence ladder extends the finite-beta QA
artifact to 9 x 9 x 7 x 4 at the central surface and remains explicitly
deferred because it does not meet the production floor.
A follow-up medium-resolution solver-policy probe at 17 x 21 x 12 x 4
validated the older non-dense xblock_sparse_pc_gmres fallback route for the
same two-species finite-beta QA deck: the automatic CPU path converged in about
7 s wall time, matched the written Fortran-v3 output to better than 1.6e-6
relative on the reported current/flux observables, and avoids dense matrix
materialization. The next 21 x 25 x 14 x 4 rung also converged on CPU and GPU
and matched the written Fortran-v3 output to better than 2.7e-6 relative on
the same observables. A newer no-probe structured full-CSR RHSMode=1 route can
also solve the active projected transport system directly on the host, avoiding
the matrix-free pattern probe that made some finite-beta QA/QH runs stall. In
the QS-paper comparison script, the bounded runtime/non-autodiff lane now uses
the fortran_reduced_pc_gmres host route with a guarded native-stack attempt
and robust active-LU fallback. On a QH 13 x 13 x 21 x 5 diagnostic grid this
route completes
three selected surfaces in about 21.9 s total and is within 9.8% of Redl
and 9.7% of archived Fortran v3 on the checked surfaces. That closes the
low-resolution residual-clean diagnostic path, but not the production-resolution
physics parity claim. Default differentiable auto and GPU-native
preconditioning are tracked separately. The full 25 x 51 x 100 x 4
production floor is still documented as a larger algorithmic validation step
rather than a closed convergence claim.
SFINCS_JAX / SFINCS Fortran v3 / Redl Bootstrap-Current Check
The paper-backed bootstrap-current check uses the QA and QH benchmark
configurations from the Zenodo artifact for arXiv:2205.02914. It evaluates the
same Redl analytic formula used in that paper and overlays sfincs_jax
RHSMode=1 kinetic solves on the same VMEC wout and profile contract. The
included cases use wout_new_QA_aScaling.nc and wout_new_QH_aScaling.nc. If the
archived SFINCS Fortran v3 sfincsOutput.h5 files are present in the Zenodo
tree, the script overlays them too; no local Fortran executable is required.
JAX_ENABLE_X64=1 \
python examples/vmec_jax_finite_beta/compare_qs_paper_sfincs_jax_redl.py \
--ntheta 13 --nzeta 13 --nxi 21 --nx 5 \
--solver-tolerance 1e-6 \
--solve-method auto
For a pure sfincs_jax versus Redl plot without the archived Fortran overlay,
add --hide-fortran. The default uses --s-values all, so it evaluates the
same 39 archived radial surfaces as the SFINCS Fortran v3 benchmark. Add
--quick for a three-surface smoke plot, or increase the grid beyond
--ntheta 13 --nzeta 13 --nxi 21 --nx 5 for convergence studies.
For the fast 11-surface educational bootstrap-current profile used in the figures below, run:
JAX_ENABLE_X64=1 \
python examples/vmec_jax_finite_beta/compare_qs_paper_sfincs_jax_redl.py \
--case QA \
--stem qs_paper_qa_same_resolution_11surface \
--s-values 0.1,0.15,0.25,0.3,0.45,0.5,0.6,0.7,0.75,0.85,0.9 \
--ntheta 13 --nzeta 13 --nxi 21 --nx 5 \
--with-errorbars \
--real-ntheta 15 --real-nzeta 15 \
--velocity-nxi 25 --velocity-nx 6 \
--fortran-case-root outputs/qs_paper_fortran_reduced_resolution/QA_Ntheta13_Nzeta13_Nxi21_Nx5 \
--fortran-errorbar-json docs/_static/figures/vmec_jax_finite_beta/qs_paper_qa_same_resolution_11surface_fortran_errorbars.json \
--require-same-resolution \
--solver-tolerance 1e-6 \
--solve-method auto
Use --case QH --stem qs_paper_qh_same_resolution_11surface,
--fortran-case-root outputs/qs_paper_fortran_reduced_resolution/QH_Ntheta13_Nzeta13_Nxi21_Nx5,
and the QH Fortran-errorbar sidecar for the quasi-helical benchmark. If the
local reduced-resolution Fortran sidecar is absent, add --hide-fortran to
generate the same sfincs_jax/Redl educational plot without the Fortran
overlay.
For an apples-to-apples SFINCS_JAX/SFINCS Fortran v3 comparison, use the same-resolution gate:
JAX_ENABLE_X64=1 \
python examples/vmec_jax_finite_beta/compare_qs_paper_sfincs_jax_redl.py \
--case QA --s-values 0.5 \
--match-fortran-resolution \
--require-same-resolution \
--verbose-sfincs \
--solver-tolerance 1e-6 \
--solve-method auto
The gate reads the archived Fortran grid and sets the JAX grid to the matching
Ntheta,Nzeta,Nxi,Nx values before running. If the plotted JAX and Fortran
surfaces do not have the same grid, the script fails before writing a public
figure. JAX error bars come from --with-errorbars refinement probes. Fortran
error bars are plotted only from an explicit --fortran-errorbar-json sidecar,
because the archived Zenodo outputs provide one Fortran solve per surface and
therefore do not by themselves define a convergence uncertainty.
Use --verbose-sfincs or set SFINCS_JAX_EXAMPLE_VERBOSE=1 for production-grid
runs; the script then forwards SFINCS_JAX phase, preconditioner, and Krylov
progress messages instead of appearing silent during long setup phases.
The current apples-to-apples QA/QH check reruns SFINCS Fortran v3 at the same
13 x 13 x 21 x 5 grid used by the fast JAX documentation solve on 11
surfaces, s = 0.10, 0.15, 0.25, 0.30, 0.45, 0.50, 0.60, 0.70, 0.75, 0.85, 0.90. JAX refinement bars come from independent 15 x 15 x 21 x 5
real-space and 13 x 13 x 25 x 6 velocity-space probes. Fortran bars are
one-sided refinement deltas against the archived 25 x 39 x 60 x 7 Fortran v3
outputs. On this same-resolution 11-surface gate, the maximum JAX-vs-Fortran
difference is 1.21e-3 for QA and 3.54e-3 for QH. The largest JAX refinement
bar is 4.21% for QA and 24.43% for QH, so QH remains a visible reduced-grid
convergence stress test rather than a production-resolution claim.
For larger RHSMode=1 reruns, keep --solve-method auto first. Eligible
non-autodiff finite-beta/full-FP cases now select the residual-clean
fortran_reduced_pc_gmres host route automatically. Structured-CSR and
preconditioner research paths remain forceable for debugging, but they are not
the recommended user-facing path unless the JSON/HDF5 diagnostics show that
their true-residual and convergence gates pass for the case being run.
The combined active_multiline_field_split_sparse_coarse residual preconditioner
is implemented and test-covered as an opt-in research path, but real QA/QH
surface probes did not pass the strict true-residual gate, so it is not the
default for these public figures. The direct-tail active-auto ladder now tries
the lower-memory active_fortran_v3_reduced_native_stack candidate first, then
falls back to the robust active_fortran_v3_reduced_lu reference route when the
native stack fails its true-residual preflight. On the full archived
25 x 39 x 60 x 7 QA surface, the native stack built in 9.17 s with a
5.09 GB bounded factor estimate but worsened the one-apply residual, so
auto accepted active LU without trying experimental native/coarse rescue by
default. The latest guarded audit converged the true residual to 7.27e-16
in 354.6 s wall at tol=1e-10 without requiring any solver environment
variables.
The plotted quantity is the flux-surface-averaged bootstrap current projected along the magnetic field,
\langle J\cdot B\rangle .
sfincs_jax reports the dimensionless current
\widehat{J}
= \mathrm{FSABjHat}
= \sum_s Z_s\,\mathrm{FSABFlow}_s,
which is converted to the paper's SI units with the archived SFINCS factor
437695 * 1e20 * e. The archived Fortran v3 outputs use the same FSABjHat
normalization and conversion factor. The Redl side uses
n_e = 4.13e20 (1-s^5) m^-3, T_e = T_i = 12 keV (1-s), and Z_eff = 1.
For QA the Redl geometry uses helicity_n = 0 and
wout_new_QA_aScaling.nc; for QH it uses helicity_n = -1 and
wout_new_QH_aScaling.nc.
The checked whole-radius QA documentation run below is a reduced-grid diagnostic,
13 x 13 x 21 x 5, on the same 39 radial surfaces as the archived Fortran v3
benchmark. The script sets the runtime/non-autodiff solver lane for this
benchmark, so all solves selected fortran_reduced_pc_gmres under auto and
reached the true-residual target. The JAX run completed those 39 surfaces in
232.5 s total versus 696.1 s from the archived Fortran v3 logs. Peak solver
memory in the plot is 109 MB for the JAX factor/CSR estimate versus
12.96 GB of effective total MUMPS factor memory across MPI ranks for the
Fortran v3 run.
The figure remains a mixed-grid diagnostic, not a production same-resolution
parity claim: max QA differences are 23.95% versus Redl and 6.94% versus
SFINCS Fortran v3 on this reduced JAX grid. README-facing same-resolution
claims require the gated command above and refinement/error-bar metadata for
both codes.
The same bounded QH diagnostic is also generated on all 39 archived surfaces.
It is residual-clean and much improved after the VMEC radial-cache fix, but it
is more sensitive to the reduced JAX grid: max QH differences are 15.31%
versus Redl and 18.77% versus SFINCS Fortran v3. The JAX whole-radius scan
completed in 261.1 s versus 655.5 s from the archived Fortran v3 logs, with
the same 109 MB JAX solver-memory estimate versus 12.80 GB Fortran MUMPS
effective total factor memory. This keeps the QH finite-beta
production-resolution convergence lane open; current term-level audits point
away from a simple stale-radius geometry, radial-gradient conversion, or
current-assembly normalization bug, but the production-resolution convergence
gate remains the acceptance criterion.
Physics in One Page
sfincs_jax solves the radially local, steady, linearized drift-kinetic
equation for the non-adiabatic distribution-function perturbation
f_s1 on a flux surface. In normalized form the solved kinetic balance is
(parallel streaming + mirror force + E x B drift + magnetic drift
+ energy/pitch-angle drifts - linearized collisions) f_s1 = thermodynamic drives.
The unknown distribution can be coupled to the flux-surface electrostatic
potential variation Phi1(theta,zeta) through quasineutrality when requested.
The output fluxes, flows, transport matrices, and diagnostics are moments of
this solved f_s1. The full equations, normalizations, switches, and source-code
mapping are documented in docs/system_equations.rst, docs/physics_models.rst,
and docs/method.rst.
Quick Start (CLI)
You can run sfincs_jax from anywhere in your terminal. You do not need to be
inside the repository folder.
Run an input file:
sfincs_jax /path/to/input.namelist
Write output explicitly:
sfincs_jax write-output --input /path/to/input.namelist --out /path/to/sfincsOutput.h5
Plot an existing output file:
sfincs_jax --plot /path/to/sfincsOutput.h5
By default this writes /path/to/sfincsOutput_summary.pdf, a multi-page panel
with geometry, radial profiles, particle/heat/momentum fluxes, NTV, moments, and
transport-matrix diagnostics when those datasets are present. Use
sfincs_jax plot-output --input-h5 ... --out custom.pdf to choose a filename.
Override the equilibrium file at the CLI without changing input.namelist:
sfincs_jax write-output \
--input /path/to/input.namelist \
--out /path/to/sfincsOutput.h5 \
--wout-path /path/to/wout.nc
The bare sfincs_jax /path/to/input.namelist form accepts the same
--equilibrium-file and --wout-path overrides.
Quick Start (Python)
Read a namelist, run sfincs_jax, write an output file, and inspect results directly in memory:
from pathlib import Path
from sfincs_jax.io import write_sfincs_jax_output_h5
input_namelist = Path("input.namelist")
out_path, results = write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
return_results=True,
)
print("Wrote:", out_path)
print("Available datasets:", len(results))
print("Example key:", "particleFlux_vm_psiHat" in results)
Set output_path=Path("sfincsOutput.nc") for NetCDF4 or
output_path=Path("sfincsOutput.npz") for a fast NumPy archive. The calculation
is identical; only the writer changes.
If you need to override the equilibrium file without editing the namelist, pass
equilibrium_file=... or the VMEC-friendly alias wout_path=...:
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
wout_path=Path("/path/to/wout.nc"),
)
sfincs_jax write-output and the scan utilities use solve_method="auto" and
differentiable=False by default. When calling
write_sfincs_jax_output_h5(...) directly, keep those defaults for production
runs or request the implicit/differentiable linear-solve path only when you need
gradients:
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
differentiable=False,
)
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
solve_method="sparse_host",
differentiable=False,
)
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
differentiable=True,
)
Most users should leave --solve-method auto; it selects only promoted
guarded policies. For reproducibility, solver debugging, or structured-CSR
research benchmarks,
solve_method="structured_csr" or solve_method="host_structured_csr" forces
the same analytic full CSR operator on the host without column probing. The
longer aliases structured_full_csr, host_full_csr, and
structured_full_csr_host_gmres refer to the same no-probe host-CSR lane.
For large RHSMode=1 full-FP host production solves, auto may instead choose
fortran_reduced_pc_gmres, which mirrors the simplified global preconditioner
strategy used by SFINCS Fortran v3 while preserving SFINCS-JAX's true-residual
acceptance check.
For full-grid finite-beta QA/QH bootstrap-current diagnostics at
25 x 39 x 60 x 7, auto now reaches the Fortran-reduced direct-tail path
without user environment variables. It tests the lower-memory
active_fortran_v3_reduced_native_stack candidate under the same true-residual
gate and falls back to the high-memory active LU reference route when that gate
fails. The checked QA auto audit converged to 9.00e-13 residual in 343.5 s
wall after rejecting the native stack; earlier checked QA/QH active-LU reference
audits converge to 9.95e-13 and 8.71e-14 residual with a 13.3 GB active LU
factor. A stricter guarded rerun with tol=1e-10 converged to 7.27e-16 in
354.6 s, and native true-coupled rescue remains opt-in because the measured
full-grid native/coarse probes reduce the bad one-apply residual but do not
produce a residual-clean Krylov solve. The remaining research/performance lane
is making the lower-memory native block/coarse factor pass the same residual
gate so the active-LU fallback is no longer needed at full grid.
SFINCS_JAX_RHS1_FULL_CSR_KRYLOV=direct \
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_DOF=1 \
SFINCS_JAX_RHS1_FULL_CSR_MAX_MB=1024 \
sfincs_jax write-output \
--input /path/to/input.namelist \
--out /path/to/sfincsOutput.h5 \
--solve-method host_structured_csr
SFINCS_JAX_RHS1_FULL_CSR_MAX_MB caps the assembled CSR matrix. If the matrix
would exceed that cap, the structured solve fails closed before the solve.
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_DOF=1 projects away inactive truncated
pitch-angle rows before the host solve and expands the result back to the full
output vector. For Krylov experiments, SFINCS_JAX_RHS1_FULL_CSR_PRECONDITIONER,
SFINCS_JAX_RHS1_FULL_CSR_PRECONDITIONER_MAX_MB, and
SFINCS_JAX_RHS1_FULL_CSR_XBLOCK_LMAX still control the x-block/coarse residual
preconditioner candidates, but the residual-clean finite-beta QA/QH diagnostic
path is currently the active projected direct solve above. Physical RHSMode=1
host_structured_csr output defaults to this route; shifted operator benchmarks
default to Krylov unless SFINCS_JAX_RHS1_FULL_CSR_KRYLOV=direct is set.
To evaluate a lower-memory iterative alternative, set
SFINCS_JAX_RHS1_FULL_CSR_KRYLOV=gmres or lgmres and
SFINCS_JAX_RHS1_FULL_CSR_PRECONDITIONER=active_low_l_schur. This projected
field-split candidate uses a sparse exact Schur residual equation over the
full-angle low-pitch active variables and the global tail; the low-pitch
cutoff is controlled by
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_LOW_L_SCHUR_LMAX, and the sparse factor is
bounded by SFINCS_JAX_RHS1_FULL_CSR_PRECONDITIONER_MAX_MB. The older
active_coarse candidate remains available; it uses low-l/angular/tail modal
coarse residual modes. Its default coarse equation is Galerkin;
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_COARSE_SOLVER=least_squares or
active_coarse_ls enables the residual-minimizing comparison. Explicit
active_overlap_schwarz builds a restricted additive-Schwarz residual
correction over overlapping speed-space patches; control its pitch cutoff and
overlap with SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_SCHWARZ_LMAX and
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_SCHWARZ_RADIUS. The combined
active_schwarz_low_l_schur path uses that Schwarz correction as the base for
the low-pitch Schur residual equation. Explicit active_xblock and
active_xblock_low_l_schur probes factor active sparse blocks at fixed species
and speed index; control their cutoff with
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_XBLOCK_LMAX. These are retained as
benchmark/debug routes after the first QA gate showed they are not yet a
promotion path. Generic
SFINCS_JAX_RHS1_FULL_CSR_PRECONDITIONER=active_ilu is also available and tuned with
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_ILU_DROP_TOL and
SFINCS_JAX_RHS1_FULL_CSR_ACTIVE_ILU_FILL_FACTOR. Treat this as a benchmark
candidate for now: physical finite-beta bootstrap-current outputs should stay on
the active direct route unless the iterative active low-L/Schwarz/xblock/coarse/ILU
residual gate passes for the case being run.
Repository examples that map directly onto common first tasks:
- run the bundled solved CLI example:
sfincs_jax examples/sfincs_examples/quick_2species_FPCollisions_noEr/input.namelist - write a tiny tokamak output:
python examples/getting_started/write_sfincs_output_tokamak.py - write a tiny VMEC output with
wout_path:python examples/getting_started/write_sfincs_output_vmec.py - run a finite-beta
vmec_jaxequilibrium into convergence-gatedsfincs_jaxradial profiles of ambipolarE_rand bootstrap current:python examples/vmec_jax_finite_beta/finite_beta_vmec_to_sfincs.py - plot the finite-beta kinetic/angular/root-bracket convergence scan from cached outputs:
python examples/vmec_jax_finite_beta/plot_convergence_scan.py - plot an output file:
python examples/getting_started/plot_sfincs_output.py - write HDF5/NetCDF/NPZ and plot a PDF panel:
python examples/getting_started/write_and_plot_multiple_formats.py - run autodiff examples:
python examples/autodiff/autodiff_gradient_nu_n_residual.py - run the optional VMEC/Boozer differentiable geometry handoff:
python examples/autodiff/vmec_jax_to_boozer_sfincs_pipeline.py --wout /path/to/wout.nc - benchmark CPU/GPU parallel solves:
python examples/performance/benchmark_sharded_solve_scaling.py --backend cpu --devices 1 2 --inner-warmup-solves 1 --sample-timeout-s 300 ...
Parallel CLI controls are now first-class:
# Multi-core CPU host sharding on one node
sfincs_jax --cores 8 --shard-axis auto /path/to/input.namelist
# Parallel transport-matrix RHS solves
sfincs_jax transport-matrix-v3 \
--input /path/to/input.namelist \
--transport-workers 4
# High-nu LHD/W7-X campaign pilot on a dual-GPU node
CUDA_VISIBLE_DEVICES=0,1 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
--case lhd \
--collision-operators 0 \
--nuprime-min 17.78279101649707 \
--nuprime-max 17.78279101649707 \
--n-points 1 \
--transport-workers 2 \
--transport-parallel-backend gpu \
--transport-sparse-direct-max 30000 \
--require-residuals \
--max-transport-residual 1e-6 \
--max-transport-relative-residual 1e-6 \
--scan-only
# The current office dual-GPU LHD pilot for that point is residual-clean in
# ~262 s, compared with ~345 s on one GPU and ~569 s on the older implicit path.
# For the first W7-X FP high-nu point, use the bounded one-worker sparse-LU lane
# below: it closes all three RHS residual gates in ~9.7 min on one office GPU
# with sparse-helper factor reuse, compared with ~33.8 min before reuse.
# W7-X FP high-nu residual-clean pilot, intentionally one worker to limit sparse
# LU memory pressure:
CUDA_VISIBLE_DEVICES=0 \
SFINCS_JAX_TRANSPORT_SPARSE_FACTOR_DTYPE=float32 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
--case w7x \
--collision-operators 0 \
--nuprime-min 17.78332923601508 \
--nuprime-max 17.78332923601508 \
--n-points 1 \
--transport-workers 1 \
--transport-parallel-backend gpu \
--transport-sparse-direct-max 40000 \
--transport-maxiter 800 \
--require-residuals \
--max-transport-residual 1e-6 \
--max-transport-relative-residual 1e-6 \
--scan-only
# To compare candidate preconditioners before widening W7-X high-nu scans,
# isolate single-RHS behavior:
CUDA_VISIBLE_DEVICES=0 \
python examples/performance/benchmark_w7x_high_nu_preconditioners.py \
--preconditioners auto,fp_tzfft,fp_tzfft_line,fp_tzfft_line_schur,fp_xblock_tz_lu,fp_xblock_tz_lu_schur,xmg \
--which-rhs 2 \
--sparse-direct-max 40000 \
--sparse-factor-dtype float32 \
--maxiter 800 \
--timeout-s 900
The W7-X high-nu figure is generated by
python examples/publication_figures/generate_w7x_high_nu_performance.py.
The checked run preserves the previous residual-clean transport matrix exactly,
reduces the full one-point wall time from about 2028 s to 582 s, and lowers
measured peak RSS from about 19.9 GB to 15.3 GB.
# One-node multi-GPU sharded solve (experimental for very large single-RHS cases)
CUDA_VISIBLE_DEVICES=0,1 \
sfincs_jax write-output \
--input /path/to/input.namelist \
--shard-axis theta \
--distributed-gmres auto
# Multi-host JAX distributed bootstrap
sfincs_jax write-output \
--input /path/to/input.namelist \
--distributed \
--process-count 8 \
--process-id ${RANK} \
--coordinator-address node0 \
--coordinator-port 1234
Use -v to have the executable print the active parallel runtime summary
(cores, shard axis, transport workers, distributed Krylov mode, and multi-host
bootstrap fields) before the solve starts.
Current recommendation:
- CPU host sharding is supported and deterministic, but the measured speedup is still case-dependent.
- The current sharded RHSMode=1 CPU path uses a wider Schwarz patch rule plus a bounded multilevel residual correction to avoid the worst 4/8-device fragmentation failures seen in earlier releases.
- Use one GPU per case or scan point for production throughput today.
- Multi-GPU single-case sharding is available for benchmarking and very large runs, but it remains experimental and is not yet the default recommendation.
- The sharded-solve benchmark helper supports both
--backend cpuand--backend gpu; the GPU path usesCUDA_VISIBLE_DEVICESand disables JAX preallocation in the subprocess, withcuda_malloc_asyncenabled for the benchmark subprocess allocator, so one-node GPU scaling experiments are more reproducible. - For practical multi-GPU usage today, the strongest measured path is
transport-worker parallelism with one worker per GPU on RHSMode=2/3 runs.
On the fresh office 2-GPU rerun of
examples/performance/transport_parallel_2min.input.namelist, this path measured351.1s -> 237.7sfrom1 -> 2GPU workers, i.e.1.48xspeedup on a 3-RHS case, essentially at the finite-task ideal of1.5x. - Multi-GPU single-case sharding remains experimental. Use it for research and benchmarking, not as the default production scaling path.
You can reproduce the recommended multi-GPU transport-worker benchmark with:
python examples/performance/benchmark_transport_parallel_scaling.py \
--input examples/performance/transport_parallel_2min.input.namelist \
--backend gpu \
--workers 1 2
To regenerate only the checked-in figure from the saved JSON payload without rerunning the multi-minute GPU benchmark:
python examples/performance/benchmark_transport_parallel_scaling.py \
--from-json examples/performance/output/transport_parallel_scaling_gpu.json \
--out-dir docs/_static/figures/parallel \
--figure-name transport_parallel_scaling_gpu.png
Compare two outputs:
sfincs_jax compare-h5 --a sfincsOutput_jax.h5 --b sfincsOutput_fortran.h5
Advanced CLI, plotting, and solver options are documented in docs/usage.rst,
docs/outputs.rst, and docs/performance_techniques.rst.
Models, Numerics, and Validation
sfincs_jax solves the same class of neoclassical drift-kinetic problems as mature
SFINCS workflows, but it is documented and maintained as its own code. In particular:
- the public executable favors bounded, performance-oriented solve strategies,
- the Python API can switch to differentiable solve paths when end-to-end sensitivities are needed,
- CPU runs lean on JIT-cached kernels and selected host sparse factorizations for hard linear branches,
- repeated RHSMode=1 output-writing runs reuse prebuilt grids, geometry, and operator state to cut setup cost on large HSX/geometry11 cases,
- GPU runs keep operator applications on device, then fall back to accelerator-safe or host rescue paths only when conditioning or memory demands it,
- and the documentation maps the governing equations directly onto the source tree.
The main documentation entry points are:
- physics and equations:
docs/physics_models.rst,docs/system_equations.rst,docs/physics_reference.rst - geometry and numerics:
docs/geometry.rst,docs/method.rst,docs/numerics.rst - inputs and outputs:
docs/inputs.rst,docs/outputs.rst - parallel and performance workflows:
docs/parallelism.rst,docs/performance.rst - examples, applications, and testing:
docs/examples.rst,docs/applications.rst,docs/testing.rst - external trust-building comparisons:
docs/fortran_comparison.rst
Current Example-Suite Audit
Regenerate this block from the current main working tree with:
python scripts/run_scaled_example_suite.py \
--examples-root examples/sfincs_examples \
--resolution-reference-root /Users/rogeriojorge/local/tests/sfincs_original/fortran/version3/examples \
--reference-results-root tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix \
--out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
--scale-factor 1.0 \
--runtime-target-basis fortran \
--fortran-min-runtime-s 0.0 \
--runtime-adjustment-iters 0 \
--runtime-baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
--jax-profile-marks on
python scripts/generate_readme_fast_branch_audit.py \
--out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
--gpu-out-root tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas \
--baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
--min-fortran-runtime-s 10
The benchmark policy on main is:
- start from the original Fortran v3 example resolution,
- only downscale when a case is too expensive for a practical suite run,
- benchmark JAX CPU and GPU against a frozen CPU-generated Fortran reference root,
- and never intentionally push a reduced case below about
1sof Fortran wall time unless the original example is already that small.
That avoids the misleading sub-second Fortran rows that came from blind global downscaling, keeps the GPU lane tied to a deterministic reference, and makes the additional example part of the same artifact set as the standard suite.
Production-resolution inputs are generated separately with
scripts/create_production_benchmark_inputs.py. When
scripts/run_scaled_example_suite.py is pointed at one of those generated
inputs/ trees, it detects the sibling manifest.json and launches only
bounded_local_ok rows by default. Use --max-run-recommendation bounded_remote,
--max-run-recommendation remote_or_cluster_only, or
--max-run-recommendation all only on explicitly budgeted remote or cluster
lanes. The Fortran wrapper used for reference generation defaults to one MPI
rank so local parity runs avoid concurrent HDF5 output writes; set
SFINCS_FORTRAN_MPI_NP explicitly only for a Fortran scaling study.
If scripts/run_reduced_upstream_suite.py is used against a generated
production input tree, pass --production-inputs so the runner uses the
manifest decks exactly and does not substitute or promote reduced CI fixtures.
Current main CPU audit comes from tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak.
Matching main GPU audit comes from tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas.
- Recorded cases:
39/39 - Practical status counts:
parity_ok=39 - Strict status counts:
parity_ok=39 - GPU practical status counts:
parity_ok=39 - GPU strict status counts:
parity_ok=39 - CPU output-key coverage:
missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0 - GPU output-key coverage:
missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0 - CPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
- GPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
- Remaining cases: none
- Additional example:
parity_okon CPU andparity_okon GPU
Current mismatches:
- CPU practical mismatches: none
- CPU strict mismatches: none
- GPU practical/strict mismatches: none
Runtime columns match the summary plot: cold is jax_runtime_s; warm/logged is jax_runtime_s_warm when available, otherwise jax_logged_elapsed_s. The JAX memory columns match the plot and use profiler active RSS deltas (jax_incremental_max_rss_mb) when present; full process peak RSS remains available as jax_max_rss_mb in the merged JSON reports.
The benchmark summary JSON records production-resolution floor violations for legacy frozen rows, so the table should be read as a reference-runtime-window comparison until every row has been rerun at the current production floor.
README-facing runtime/memory rows are restricted to cases where the SFINCS Fortran v3 reference runtime is at least 10 s. Excluded lower-resolution CI parity/smoke rows: HSX_PASCollisions_DKESTrajectories (0.994s), HSX_PASCollisions_fullTrajectories (2.510s), geometryScheme4_1species_PAS_withEr_DKESTrajectories (1.365s), geometryScheme4_2species_PAS_noEr (0.953s), monoenergetic_geometryScheme1 (0.795s), monoenergetic_geometryScheme11 (0.861s), monoenergetic_geometryScheme5_ASCII (1.052s), monoenergetic_geometryScheme5_netCDF (1.029s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_DKESTrajectories (1.104s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_fullTrajectories (1.706s), tokamak_1species_FPCollisions_noEr (7.897s), tokamak_1species_FPCollisions_withEr_DKESTrajectories (6.958s), tokamak_1species_FPCollisions_withEr_fullTrajectories (6.736s), transportMatrix_geometryScheme11 (0.025s), transportMatrix_geometryScheme2 (0.031s).
Full per-case runtime / memory table:
| Case | Fortran CPU(s) | JAX CPU cold(s) | CPU cold x | JAX CPU warm/logged(s) | CPU warm/logged x | JAX GPU cold(s) | GPU cold x | JAX GPU warm/logged(s) | GPU warm/logged x | Fortran MB | JAX CPU active MB | CPU MB x | JAX GPU active MB | GPU MB x | CPU mismatch | GPU mismatch | CPU print | GPU print | CPU status | GPU status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HSX_FPCollisions_DKESTrajectories |
29.664 | 3.060 | 0.10x | 2.438 | 0.08x | 5.298 | 0.18x | 4.514 | 0.15x | 103.0 | 303.6 | 2.95x | 370.1 | 3.59x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
HSX_FPCollisions_fullTrajectories |
88.504 | 3.054 | 0.03x | 2.414 | 0.03x | 5.247 | 0.06x | 4.493 | 0.05x | 100.8 | 314.1 | 3.12x | 374.9 | 3.72x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
additional_examples |
120.074 | 1.733 | 0.01x | 1.063 | 0.01x | 2.633 | 0.02x | 1.898 | 0.02x | 102.1 | 237.2 | 2.32x | 336.1 | 3.29x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_magneticDrifts_noEr |
89.052 | 2.069 | 0.02x | 1.417 | 0.02x | 2.834 | 0.03x | 2.115 | 0.02x | 103.2 | 288.3 | 2.79x | 349.3 | 3.38x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_magneticDrifts_withEr |
95.440 | 2.011 | 0.02x | 1.372 | 0.01x | 3.339 | 0.03x | 2.590 | 0.03x | 96.2 | 318.7 | 3.31x | 355.4 | 3.69x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_noEr |
128.508 | 1.550 | 0.01x | 0.978 | 0.01x | 2.734 | 0.02x | 1.964 | 0.02x | 100.3 | 266.9 | 2.66x | 343.8 | 3.43x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr |
139.240 | 1.733 | 0.01x | 1.112 | 0.01x | 2.888 | 0.02x | 2.105 | 0.02x | 92.2 | 276.8 | 3.00x | 364.2 | 3.95x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr_withPhi1InDKE |
293.275 | 1.936 | 0.01x | 1.353 | 0.00x | 3.340 | 0.01x | 2.596 | 0.01x | 100.6 | 288.6 | 2.87x | 394.0 | 3.91x | 0/265 (strict 0/265) | 0/265 (strict 0/265) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr_withQN |
146.734 | 1.769 | 0.01x | 1.140 | 0.01x | 3.132 | 0.02x | 2.402 | 0.02x | 95.1 | 276.2 | 2.91x | 381.2 | 4.01x | 0/265 (strict 0/265) | 0/265 (strict 0/265) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_withEr_fullTrajectories |
58.053 | 1.710 | 0.03x | 1.144 | 0.02x | 3.032 | 0.05x | 2.258 | 0.04x | 113.4 | 284.6 | 2.51x | 359.1 | 3.17x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_withEr_fullTrajectories_withQN |
211.358 | 1.889 | 0.01x | 1.310 | 0.01x | 3.087 | 0.01x | 2.314 | 0.01x | 98.8 | 295.1 | 2.99x | 384.2 | 3.89x | 0/251 (strict 0/251) | 0/251 (strict 0/251) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme5_3species_loRes |
98.976 | 1.615 | 0.02x | 1.074 | 0.01x | 3.691 | 0.04x | 2.908 | 0.03x | 129.6 | 352.8 | 2.72x | 363.3 | 2.80x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
inductiveE_noEr |
166.614 | 1.644 | 0.01x | 1.039 | 0.01x | 2.785 | 0.02x | 1.992 | 0.01x | 99.2 | 279.7 | 2.82x | 364.8 | 3.68x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
quick_2species_FPCollisions_noEr |
166.945 | 1.531 | 0.01x | 0.983 | 0.01x | 2.938 | 0.02x | 2.200 | 0.01x | 97.1 | 269.0 | 2.77x | 363.7 | 3.74x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories |
76.666 | 1.653 | 0.02x | 1.110 | 0.01x | 3.188 | 0.04x | 2.391 | 0.03x | 106.7 | 294.2 | 2.76x | 367.6 | 3.44x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories |
93.439 | 1.767 | 0.02x | 1.177 | 0.01x | 3.138 | 0.03x | 2.363 | 0.03x | 94.0 | 303.6 | 3.23x | 372.2 | 3.96x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_FPCollisions_noEr_withPhi1InDKE |
41.132 | 13.276 | 0.32x | 12.527 | 0.30x | 16.744 | 0.41x | 16.744 | 0.41x | 169.1 | 796.8 | 4.71x | 613.9 | 3.63x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_FPCollisions_noEr_withQN |
10.952 | 9.019 | 0.82x | 8.255 | 0.75x | 7.175 | 0.66x | 7.175 | 0.66x | 159.4 | 800.0 | 5.02x | 469.1 | 2.94x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr |
75.566 | 2.297 | 0.03x | 2.281 | 0.03x | 3.629 | 0.05x | 3.591 | 0.05x | 155.3 | 336.5 | 2.17x | 1094.1 | 7.05x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr_Nx1 |
75.533 | 1.339 | 0.02x | 1.330 | 0.02x | 2.728 | 0.04x | 2.718 | 0.04x | 119.2 | 278.2 | 2.33x | 1076.7 | 9.03x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr_withQN |
75.242 | 5.147 | 0.07x | 5.147 | 0.07x | 11.444 | 0.15x | 10.019 | 0.13x | 165.0 | 612.3 | 3.71x | 459.5 | 2.78x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_withEr_fullTrajectories |
75.698 | 7.049 | 0.09x | 6.231 | 0.08x | 14.423 | 0.19x | 13.193 | 0.17x | 248.9 | 1319.5 | 5.30x | 1572.1 | 6.32x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 8/9 | 8/9 | parity_ok | parity_ok |
tokamak_2species_PASCollisions_noEr |
75.362 | 2.033 | 0.03x | 2.023 | 0.03x | 5.243 | 0.07x | 5.207 | 0.07x | 215.3 | 393.5 | 1.83x | 1168.7 | 5.43x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_2species_PASCollisions_withEr_fullTrajectories |
76.530 | 9.435 | 0.12x | 8.669 | 0.11x | 23.369 | 0.31x | 22.264 | 0.29x | 386.6 | 1389.9 | 3.60x | 2007.0 | 5.19x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 8/9 | 8/9 | parity_ok | parity_ok |
Documentation
Build docs locally:
sphinx-build -b html -W docs docs/_build/html
Entry points:
docs/index.rstdocs/system_equations.rstdocs/method.rstdocs/normalizations.rstdocs/performance.rstdocs/parallelism.rst
Testing
pytest -q
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sfincs_jax-1.1.6.tar.gz.
File metadata
- Download URL: sfincs_jax-1.1.6.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4bfe14737465c5c7b8e20b96cf7ed3471417c4dd0f775b57195bcbd79b98b22
|
|
| MD5 |
d6f88d59140261d3285f55b872c2c3c9
|
|
| BLAKE2b-256 |
b7044fdc70018a832e988d3fc1c5ff4c583eca1809d895ae58f198eabb104214
|
Provenance
The following attestation bundles were made for sfincs_jax-1.1.6.tar.gz:
Publisher:
publish-pypi.yml on uwplasma/sfincs_jax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfincs_jax-1.1.6.tar.gz -
Subject digest:
c4bfe14737465c5c7b8e20b96cf7ed3471417c4dd0f775b57195bcbd79b98b22 - Sigstore transparency entry: 1781856864
- Sigstore integration time:
-
Permalink:
uwplasma/sfincs_jax@a7c2fced802f4d2cabceb273a43ca3432763e21c -
Branch / Tag:
refs/tags/v1.1.6 - Owner: https://github.com/uwplasma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a7c2fced802f4d2cabceb273a43ca3432763e21c -
Trigger Event:
push
-
Statement type:
File details
Details for the file sfincs_jax-1.1.6-py3-none-any.whl.
File metadata
- Download URL: sfincs_jax-1.1.6-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
492f6ccd09090c7704d536f52815c6672da32ddd7ebf15539c1df4ea3dd8233c
|
|
| MD5 |
bae2dc3f42630a70bd1760cb1d85426f
|
|
| BLAKE2b-256 |
7be61e16776677f5d1f4c8695141ece78facd761c23203ca82cdb95a7a3ba753
|
Provenance
The following attestation bundles were made for sfincs_jax-1.1.6-py3-none-any.whl:
Publisher:
publish-pypi.yml on uwplasma/sfincs_jax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfincs_jax-1.1.6-py3-none-any.whl -
Subject digest:
492f6ccd09090c7704d536f52815c6672da32ddd7ebf15539c1df4ea3dd8233c - Sigstore transparency entry: 1781856948
- Sigstore integration time:
-
Permalink:
uwplasma/sfincs_jax@a7c2fced802f4d2cabceb273a43ca3432763e21c -
Branch / Tag:
refs/tags/v1.1.6 - Owner: https://github.com/uwplasma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a7c2fced802f4d2cabceb273a43ca3432763e21c -
Trigger Event:
push
-
Statement type: