Neoclassical transport solver with CPU/GPU and differentiable JAX workflows
Project description
sfincs_jax
sfincs_jax is a standalone neoclassical transport code for radially local
drift-kinetic calculations in stellarator and tokamak geometry. It combines
high-fidelity kinetic models, CPU/GPU execution, modern matrix-free numerics,
parallel workflows, and optional differentiable solve paths in one codebase.
On the current main branch, the audited reduced example suite runs cleanly on
CPU and GPU. A separate production-resolution benchmark tier is now being used
for public runtime/memory claims and research-scale workloads; those larger
finite-beta/profile-current cases are tracked separately from the fast smoke
suite. The default CLI path is tuned for robust explicit solves and practical
throughput, while the Python API can opt into differentiable solve paths when
gradients matter.
It is designed for:
- high-performance runs on CPU/GPU,
- research and production transport workflows,
- memory-efficient large solves,
- end-to-end differentiable workflows.
Installation
Install from PyPI:
pip install sfincs_jax
Install from source:
git clone https://github.com/uwplasma/sfincs_jax.git
cd sfincs_jax
pip install .
After installing from a source checkout, you can run the CLI immediately on the
bundled tiny example. The suffix of --out selects the output format:
.h5/.hdf5 for the Fortran-compatible HDF5 file, .nc/.netcdf for NetCDF4,
and .npz for a fast NumPy archive.
cd sfincs_jax
sfincs_jax write-output \
--input examples/getting_started/input.namelist \
--out sfincsOutput.h5 \
--solver-trace solver_trace.json \
--geometry-only
sfincs_jax --plot sfincsOutput.h5
This is the fast installation smoke test. It writes sfincsOutput.h5 and then
writes a multi-page PDF diagnostics panel next to it as
sfincsOutput_summary.pdf. The optional --solver-trace sidecar records the
backend, selected solve lane, timing, and output metadata without changing the
parity-oriented output file. The same command works for NetCDF and NPZ:
sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.nc --geometry-only
sfincs_jax write-output --input examples/getting_started/input.namelist --out sfincsOutput.npz --geometry-only
sfincs_jax --plot sfincsOutput.nc
For larger non-differentiable RHSMode=1 production runs, the CLI can leave the
matrix-free Krylov path when a measured policy is safer or faster. Audited CPU
3D full-FP systems auto-select sparse-PC GMRES inside a measured size window
when it is faster and lower-memory than dense FP. Bounded CPU tokamak
electric-field systems auto-select dense LU only in the validated active-size
window where it avoids the slow Krylov/strong/sparse-rescue ladder; this is a
runtime win with a higher transient memory footprint, so it is CPU-only and
size-capped. Measured GPU tokamak full-FP no-Er/Er production-floor rows
auto-select structured x-block sparse-PC GMRES when that is the parity-clean,
lower-memory route; this avoids the global dense-velocity sparse-pattern setup
that dominated earlier production-floor runs. On the non-differentiable
full-FP x-block path, per-x/TZ blocks up to size 3000 use exact sparse LU
before falling back to ILU; this measured policy removes the remaining
production-floor full-trajectory cliff without changing PAS or autodiff paths.
For tokamak full-FP Er x-block solves, the same path is now the default on CPU
and GPU. For one-species decks, the Fortran-style
preconditioner_species = 0 setting is treated as equivalent to the compact
per-species x-block because there is no inter-species coupling to preserve. The
production-floor CPU Er rows now run in about 1 s logged / 1.7-1.9 s cold
external time with about 0.42-0.44 GB RSS instead of multi-GB dense-assembly
setup. On an RTX A4000, the same rows are residual-clean at about 23.3 s for
DKES trajectories and 11.0 s for full trajectories, with active RSS deltas
near 1.1 GB; the full-trajectory row uses right-preconditioned GMRES with a
short restart.
Large constrained-PAS profile-current decks also auto-select sparse-PC GMRES
when the problem size is in the validated production window. Tokamak PAS no-Er
production-floor sparse-PC runs use the measured MMD_ATA SuperLU ordering by
default; PAS+Er full-trajectory runs use the lower-fill measured
MMD_AT_PLUS_A ordering, and the one-species PAS+Er row caps the default
sparse-PC GMRES restart at 40 unless the user explicitly overrides it. Both
routes factor only the active Nxi_for_x degrees of freedom, which cuts
sparse-factor fill while preserving strict Fortran parity on the audited CPU
and RTX A4000 GPU rows. On the public no-Er production rows,
this lowers the two-species CPU active RSS to about 0.39 GB, cuts the
two-species GPU runtime to about 5.2 s, and cuts the one-species Nx=4 GPU
runtime from about 28.8 s to about 3.0 s, all with zero Fortran output
mismatches. Explicit sparse-host LU and x-block sparse PC remain available:
sfincs_jax write-output \
--input /path/to/input.namelist \
--out sfincsOutput.h5 \
--solve-method sparse_host_safe \
--solver-trace solver_trace.json
sfincs_jax write-output \
--input /path/to/input.namelist \
--out sfincsOutput.h5 \
--solve-method xblock_sparse_pc_gmres \
--solver-trace solver_trace.json
This path keeps the full SFINCS system, builds a conservative sparse pattern,
probes only colored structural nonzeros, and first tries host sparse LU. The
sparse_host_safe mode falls back only for constrained PAS systems where sparse
LU exposes a singular/gauge-sensitive branch; that fallback is labeled as
PETSc-compatible minimum-norm output instead of pretending the true residual
target was met. RHSMode=1 outputs include linearSolverMethod,
linearSolverResidualNorm, linearSolverResidualTarget,
linearSolverResidualTargetRatio, linearSolverConverged,
linearSolverAccepted, and linearSolverAcceptanceCriterion in the main output
file. Sparse-PC runs also write setup/solve/factorization timings and
sparse-pattern counters such as linearSolverMatvecs,
linearSolverSetupTime, linearSolverSolveTime,
linearSolverSparsePCFactorTime, linearSolverSparsePatternNnz,
linearSolverSparsePCXBlockPreconditionerXi, and
linearSolverSparsePCXBlockAssembledHost.
The production benchmark manifest now enforces large research-scale floors:
25 x 51 x 4 x 100 (Ntheta x Nzeta x Nx x Nxi) for 3D cases and
25 x 1 x 4 x 100 for tokamak cases. Public production timing rows target
SFINCS Fortran v3 runtimes of at least 10 s; earlier 17 x 21 x 5 x 12
finite-beta/profile-current timings were lower-resolution bring-up checks for
this solver lane, not public production baselines.
Runtime and Memory Summary
The release benchmark above compares SFINCS Fortran v3, sfincs_jax CPU cold,
sfincs_jax CPU warm, sfincs_jax GPU cold, and sfincs_jax GPU warm only for
reference-runtime-window rows whose Fortran v3 reference runtime is at least
10 s; the summary JSON records which legacy frozen rows still need a full
production-resolution rerun.
The left panel shows wall-clock runtime and the right panel shows active solver
memory, both on log axes. Fortran memory is process maximum RSS; JAX memory uses
profiler RSS deltas over the fixed Python/JAX/XLA runtime baseline, while the
full process peak remains in the JSON reports as jax_max_rss_mb. Cold is the
first external suite command. Warm runtime
uses jax_runtime_s_warm when reports were generated with --jax-repeats >= 2;
for the current frozen release reports, it falls back to the CLI
jax_logged_elapsed_s field. Cases are ordered by best warm sfincs_jax speedup
over the Fortran v3 runtime, so the strongest JAX wins appear first. Regenerate
the plot with
python examples/publication_figures/generate_fortran_suite_benchmark_summary.py.
Scope note: the full 39-case frozen suite remains the parity and CI smoke audit,
but sub-10 s Fortran rows are not shown as public performance comparisons. They
must either be rerun from the higher-resolution benchmark tier in
benchmarks/production_resolution_inputs_2026-05-04 or remain CI/regression
checks only. The production tier enforces 25 x 51 x 4 x 100 3D grids and
25 x 1 x 4 x 100 tokamak grids, including public examples and optional
user-supplied production-resolution workloads. The manual GitHub workflow
Production Benchmark Inputs validates and uploads the production input tree
without running expensive solves; full CPU/GPU/Fortran runtime and memory sweeps
should be launched on local, office, or cluster hardware with explicit
resource budgets.
Physics in One Page
sfincs_jax solves the radially local, steady, linearized drift-kinetic
equation for the non-adiabatic distribution-function perturbation
f_s1 on a flux surface. In normalized form the solved kinetic balance is
(parallel streaming + mirror force + E x B drift + magnetic drift
+ energy/pitch-angle drifts - linearized collisions) f_s1 = thermodynamic drives.
The unknown distribution can be coupled to the flux-surface electrostatic
potential variation Phi1(theta,zeta) through quasineutrality when requested.
The output fluxes, flows, transport matrices, and diagnostics are moments of
this solved f_s1. The full equations, normalizations, switches, and source-code
mapping are documented in docs/system_equations.rst, docs/physics_models.rst,
and docs/method.rst.
Quick Start (CLI)
You can run sfincs_jax from anywhere in your terminal. You do not need to be
inside the repository folder.
Run an input file:
sfincs_jax /path/to/input.namelist
Write output explicitly:
sfincs_jax write-output --input /path/to/input.namelist --out /path/to/sfincsOutput.h5
Plot an existing output file:
sfincs_jax --plot /path/to/sfincsOutput.h5
By default this writes /path/to/sfincsOutput_summary.pdf, a multi-page panel
with geometry, radial profiles, particle/heat/momentum fluxes, NTV, moments, and
transport-matrix diagnostics when those datasets are present. Use
sfincs_jax plot-output --input-h5 ... --out custom.pdf to choose a filename.
Override the equilibrium file at the CLI without changing input.namelist:
sfincs_jax write-output \
--input /path/to/input.namelist \
--out /path/to/sfincsOutput.h5 \
--wout-path /path/to/wout.nc
The bare sfincs_jax /path/to/input.namelist form accepts the same
--equilibrium-file and --wout-path overrides.
Quick Start (Python)
Read a namelist, run sfincs_jax, write an output file, and inspect results directly in memory:
from pathlib import Path
from sfincs_jax.io import write_sfincs_jax_output_h5
input_namelist = Path("input.namelist")
out_path, results = write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
return_results=True,
)
print("Wrote:", out_path)
print("Available datasets:", len(results))
print("Example key:", "particleFlux_vm_psiHat" in results)
Set output_path=Path("sfincsOutput.nc") for NetCDF4 or
output_path=Path("sfincsOutput.npz") for a fast NumPy archive. The calculation
is identical; only the writer changes.
If you need to override the equilibrium file without editing the namelist, pass
equilibrium_file=... or the VMEC-friendly alias wout_path=...:
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
wout_path=Path("/path/to/wout.nc"),
)
sfincs_jax write-output and the scan utilities use the explicit
performance-oriented solve path by default. When calling
write_sfincs_jax_output_h5(...) directly, pass differentiable=False for the
same fast path or request the implicit/differentiable linear-solve path only when
you need gradients:
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
differentiable=False,
)
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
solve_method="sparse_host",
differentiable=False,
)
write_sfincs_jax_output_h5(
input_namelist=input_namelist,
output_path=Path("sfincsOutput.h5"),
differentiable=True,
)
Repository examples that map directly onto common first tasks:
- run the bundled tiny CLI example:
sfincs_jax examples/getting_started/input.namelist - write a tiny tokamak output:
python examples/getting_started/write_sfincs_output_tokamak.py - write a tiny VMEC output with
wout_path:python examples/getting_started/write_sfincs_output_vmec.py - run a finite-beta
vmec_jaxequilibrium into convergence-gatedsfincs_jaxradial profiles of ambipolarE_rand bootstrap current:python examples/vmec_jax_finite_beta/finite_beta_vmec_to_sfincs.py - plot the finite-beta kinetic/angular/root-bracket convergence scan from cached outputs:
python examples/vmec_jax_finite_beta/plot_convergence_scan.py - plot an output file:
python examples/getting_started/plot_sfincs_output.py - write HDF5/NetCDF/NPZ and plot a PDF panel:
python examples/getting_started/write_and_plot_multiple_formats.py - run autodiff examples:
python examples/autodiff/autodiff_gradient_nu_n_residual.py - run the optional VMEC/Boozer differentiable geometry handoff:
python examples/autodiff/vmec_jax_to_boozer_sfincs_pipeline.py --wout /path/to/wout.nc - benchmark CPU/GPU parallel solves:
python examples/performance/benchmark_sharded_solve_scaling.py --backend cpu --devices 1 2 --inner-warmup-solves 1 --sample-timeout-s 300 ...
Parallel CLI controls are now first-class:
# Multi-core CPU host sharding on one node
sfincs_jax --cores 8 --shard-axis auto /path/to/input.namelist
# Parallel transport-matrix RHS solves
sfincs_jax transport-matrix-v3 \
--input /path/to/input.namelist \
--transport-workers 4
# High-nu LHD/W7-X campaign pilot on a dual-GPU node
CUDA_VISIBLE_DEVICES=0,1 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
--case lhd \
--collision-operators 0 \
--nuprime-min 17.78279101649707 \
--nuprime-max 17.78279101649707 \
--n-points 1 \
--transport-workers 2 \
--transport-parallel-backend gpu \
--transport-sparse-direct-max 30000 \
--require-residuals \
--max-transport-residual 1e-6 \
--max-transport-relative-residual 1e-6 \
--scan-only
# The current office dual-GPU LHD pilot for that point is residual-clean in
# ~262 s, compared with ~345 s on one GPU and ~569 s on the older implicit path.
# For the first W7-X FP high-nu point, use the bounded one-worker sparse-LU lane
# below: it closes all three RHS residual gates in ~9.7 min on one office GPU
# with sparse-helper factor reuse, compared with ~33.8 min before reuse.
# W7-X FP high-nu residual-clean pilot, intentionally one worker to limit sparse
# LU memory pressure:
CUDA_VISIBLE_DEVICES=0 \
SFINCS_JAX_TRANSPORT_SPARSE_FACTOR_DTYPE=float32 \
python examples/publication_figures/generate_sfincs_paper_figs.py \
--case w7x \
--collision-operators 0 \
--nuprime-min 17.78332923601508 \
--nuprime-max 17.78332923601508 \
--n-points 1 \
--transport-workers 1 \
--transport-parallel-backend gpu \
--transport-sparse-direct-max 40000 \
--transport-maxiter 800 \
--require-residuals \
--max-transport-residual 1e-6 \
--max-transport-relative-residual 1e-6 \
--scan-only
# To compare candidate preconditioners before widening W7-X high-nu scans,
# isolate single-RHS behavior:
CUDA_VISIBLE_DEVICES=0 \
python examples/performance/benchmark_w7x_high_nu_preconditioners.py \
--preconditioners auto,fp_tzfft,xmg \
--which-rhs 2 \
--sparse-direct-max 40000 \
--sparse-factor-dtype float32 \
--maxiter 800 \
--timeout-s 900
The W7-X high-nu figure is generated by
python examples/publication_figures/generate_w7x_high_nu_performance.py.
The checked run preserves the previous residual-clean transport matrix exactly,
reduces the full one-point wall time from about 2028 s to 582 s, and lowers
measured peak RSS from about 19.9 GB to 15.3 GB.
# One-node multi-GPU sharded solve (experimental for very large single-RHS cases)
CUDA_VISIBLE_DEVICES=0,1 \
sfincs_jax write-output \
--input /path/to/input.namelist \
--shard-axis theta \
--distributed-gmres auto
# Multi-host JAX distributed bootstrap
sfincs_jax write-output \
--input /path/to/input.namelist \
--distributed \
--process-count 8 \
--process-id ${RANK} \
--coordinator-address node0 \
--coordinator-port 1234
Use -v to have the executable print the active parallel runtime summary
(cores, shard axis, transport workers, distributed Krylov mode, and multi-host
bootstrap fields) before the solve starts.
Current recommendation:
- CPU host sharding is supported and deterministic, but the measured speedup is still case-dependent.
- The current sharded RHSMode=1 CPU path uses a wider Schwarz patch rule plus a bounded multilevel residual correction to avoid the worst 4/8-device fragmentation failures seen in earlier releases.
- Use one GPU per case or scan point for production throughput today.
- Multi-GPU single-case sharding is available for benchmarking and very large runs, but it remains experimental and is not yet the default recommendation.
- The sharded-solve benchmark helper supports both
--backend cpuand--backend gpu; the GPU path usesCUDA_VISIBLE_DEVICESand disables JAX preallocation in the subprocess, withcuda_malloc_asyncenabled for the benchmark subprocess allocator, so one-node GPU scaling experiments are more reproducible. - For practical multi-GPU usage today, the strongest measured path is
transport-worker parallelism with one worker per GPU on RHSMode=2/3 runs.
On the fresh office 2-GPU rerun of
examples/performance/transport_parallel_2min.input.namelist, this path measured351.1s -> 237.7sfrom1 -> 2GPU workers, i.e.1.48xspeedup on a 3-RHS case, essentially at the finite-task ideal of1.5x. - Multi-GPU single-case sharding remains experimental. Use it for research and benchmarking, not as the default production scaling path.
You can reproduce the recommended multi-GPU transport-worker benchmark with:
python examples/performance/benchmark_transport_parallel_scaling.py \
--input examples/performance/transport_parallel_2min.input.namelist \
--backend gpu \
--workers 1 2
To regenerate only the checked-in figure from the saved JSON payload without rerunning the multi-minute GPU benchmark:
python examples/performance/benchmark_transport_parallel_scaling.py \
--from-json examples/performance/output/transport_parallel_scaling_gpu.json \
--out-dir docs/_static/figures/parallel \
--figure-name transport_parallel_scaling_gpu.png
Compare two outputs:
sfincs_jax compare-h5 --a sfincsOutput_jax.h5 --b sfincsOutput_fortran.h5
Advanced CLI, plotting, and solver options are documented in docs/usage.rst,
docs/outputs.rst, and docs/performance_techniques.rst.
Models, Numerics, and Validation
sfincs_jax solves the same class of neoclassical drift-kinetic problems as mature
SFINCS workflows, but it is documented and maintained as its own code. In particular:
- the public executable favors bounded, performance-oriented solve strategies,
- the Python API can switch to differentiable solve paths when end-to-end sensitivities are needed,
- CPU runs lean on JIT-cached kernels and selected host sparse factorizations for hard linear branches,
- repeated RHSMode=1 output-writing runs reuse prebuilt grids, geometry, and operator state to cut setup cost on large HSX/geometry11 cases,
- GPU runs keep operator applications on device, then fall back to accelerator-safe or host rescue paths only when conditioning or memory demands it,
- and the documentation maps the governing equations directly onto the source tree.
The main documentation entry points are:
- physics and equations:
docs/physics_models.rst,docs/system_equations.rst,docs/physics_reference.rst - geometry and numerics:
docs/geometry.rst,docs/method.rst,docs/numerics.rst - inputs and outputs:
docs/inputs.rst,docs/outputs.rst - parallel and performance workflows:
docs/parallelism.rst,docs/performance.rst - examples, applications, and testing:
docs/examples.rst,docs/applications.rst,docs/testing.rst - external trust-building comparisons:
docs/fortran_comparison.rst
Current Example-Suite Audit
Regenerate this block from the current main working tree with:
python scripts/run_scaled_example_suite.py \
--examples-root examples/sfincs_examples \
--resolution-reference-root /Users/rogeriojorge/local/tests/sfincs_original/fortran/version3/examples \
--reference-results-root tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix \
--out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
--scale-factor 1.0 \
--runtime-target-basis fortran \
--fortran-min-runtime-s 0.0 \
--runtime-adjustment-iters 0 \
--runtime-baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
--jax-profile-marks on
python scripts/generate_readme_fast_branch_audit.py \
--out-root tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak \
--gpu-out-root tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas \
--baseline-report tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json \
--min-fortran-runtime-s 10
The benchmark policy on main is:
- start from the original Fortran v3 example resolution,
- only downscale when a case is too expensive for a practical suite run,
- benchmark JAX CPU and GPU against a frozen CPU-generated Fortran reference root,
- and never intentionally push a reduced case below about
1sof Fortran wall time unless the original example is already that small.
That avoids the misleading sub-second Fortran rows that came from blind global downscaling, keeps the GPU lane tied to a deterministic reference, and makes the additional example part of the same artifact set as the standard suite.
Production-resolution inputs are generated separately with
scripts/create_production_benchmark_inputs.py. When
scripts/run_scaled_example_suite.py is pointed at one of those generated
inputs/ trees, it detects the sibling manifest.json and launches only
bounded_local_ok rows by default. Use --max-run-recommendation bounded_remote,
--max-run-recommendation remote_or_cluster_only, or
--max-run-recommendation all only on explicitly budgeted remote or cluster
lanes.
Current main CPU audit comes from tests/scaled_example_suite_release_cpu_2026-05-08_production_tokamak.
Matching main GPU audit comes from tests/scaled_example_suite_gpu_bounded_default_2026-05-08_lu3000_pas.
- Recorded cases:
39/39 - Practical status counts:
parity_ok=39 - Strict status counts:
parity_ok=39 - GPU practical status counts:
parity_ok=39 - GPU strict status counts:
parity_ok=39 - CPU output-key coverage:
missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0 - GPU output-key coverage:
missing_total=0, extra_total=-, audited_cases=39, skipped_cases=0 - CPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
- GPU runtime drift watchlist: not applicable: production-floor reruns are not same-resolution with the older frozen smoke baseline
- Remaining cases: none
- Additional example:
parity_okon CPU andparity_okon GPU
Current mismatches:
- CPU practical mismatches: none
- CPU strict mismatches: none
- GPU practical/strict mismatches: none
Runtime columns match the summary plot: cold is jax_runtime_s; warm/logged is jax_runtime_s_warm when available, otherwise jax_logged_elapsed_s. The JAX memory columns match the plot and use profiler active RSS deltas (jax_incremental_max_rss_mb) when present; full process peak RSS remains available as jax_max_rss_mb in the merged JSON reports.
The benchmark summary JSON records production-resolution floor violations for legacy frozen rows, so the table should be read as a reference-runtime-window comparison until every row has been rerun at the current production floor.
README-facing runtime/memory rows are restricted to cases where the SFINCS Fortran v3 reference runtime is at least 10 s. Excluded lower-resolution CI parity/smoke rows: HSX_PASCollisions_DKESTrajectories (0.994s), HSX_PASCollisions_fullTrajectories (2.510s), geometryScheme4_1species_PAS_withEr_DKESTrajectories (1.365s), geometryScheme4_2species_PAS_noEr (0.953s), monoenergetic_geometryScheme1 (0.795s), monoenergetic_geometryScheme11 (0.861s), monoenergetic_geometryScheme5_ASCII (1.052s), monoenergetic_geometryScheme5_netCDF (1.029s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_DKESTrajectories (1.104s), sfincsPaperFigure3_geometryScheme11_PASCollisions_2Species_fullTrajectories (1.706s), tokamak_1species_FPCollisions_noEr (7.897s), tokamak_1species_FPCollisions_withEr_DKESTrajectories (6.958s), tokamak_1species_FPCollisions_withEr_fullTrajectories (6.736s), transportMatrix_geometryScheme11 (0.025s), transportMatrix_geometryScheme2 (0.031s).
Full per-case runtime / memory table:
| Case | Fortran CPU(s) | JAX CPU cold(s) | CPU cold x | JAX CPU warm/logged(s) | CPU warm/logged x | JAX GPU cold(s) | GPU cold x | JAX GPU warm/logged(s) | GPU warm/logged x | Fortran MB | JAX CPU active MB | CPU MB x | JAX GPU active MB | GPU MB x | CPU mismatch | GPU mismatch | CPU print | GPU print | CPU status | GPU status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HSX_FPCollisions_DKESTrajectories |
29.664 | 3.060 | 0.10x | 2.438 | 0.08x | 5.298 | 0.18x | 4.514 | 0.15x | 103.0 | 303.6 | 2.95x | 370.1 | 3.59x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
HSX_FPCollisions_fullTrajectories |
88.504 | 3.054 | 0.03x | 2.414 | 0.03x | 5.247 | 0.06x | 4.493 | 0.05x | 100.8 | 314.1 | 3.12x | 374.9 | 3.72x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
additional_examples |
120.074 | 1.733 | 0.01x | 1.063 | 0.01x | 2.633 | 0.02x | 1.898 | 0.02x | 102.1 | 237.2 | 2.32x | 336.1 | 3.29x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_magneticDrifts_noEr |
89.052 | 2.069 | 0.02x | 1.417 | 0.02x | 2.834 | 0.03x | 2.115 | 0.02x | 103.2 | 288.3 | 2.79x | 349.3 | 3.38x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_magneticDrifts_withEr |
95.440 | 2.011 | 0.02x | 1.372 | 0.01x | 3.339 | 0.03x | 2.590 | 0.03x | 96.2 | 318.7 | 3.31x | 355.4 | 3.69x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
filteredW7XNetCDF_2species_noEr |
128.508 | 1.550 | 0.01x | 0.978 | 0.01x | 2.734 | 0.02x | 1.964 | 0.02x | 100.3 | 266.9 | 2.66x | 343.8 | 3.43x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr |
139.240 | 1.733 | 0.01x | 1.112 | 0.01x | 2.888 | 0.02x | 2.105 | 0.02x | 92.2 | 276.8 | 3.00x | 364.2 | 3.95x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr_withPhi1InDKE |
293.275 | 1.936 | 0.01x | 1.353 | 0.00x | 3.340 | 0.01x | 2.596 | 0.01x | 100.6 | 288.6 | 2.87x | 394.0 | 3.91x | 0/265 (strict 0/265) | 0/265 (strict 0/265) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_noEr_withQN |
146.734 | 1.769 | 0.01x | 1.140 | 0.01x | 3.132 | 0.02x | 2.402 | 0.02x | 95.1 | 276.2 | 2.91x | 381.2 | 4.01x | 0/265 (strict 0/265) | 0/265 (strict 0/265) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_withEr_fullTrajectories |
58.053 | 1.710 | 0.03x | 1.144 | 0.02x | 3.032 | 0.05x | 2.258 | 0.04x | 113.4 | 284.6 | 2.51x | 359.1 | 3.17x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme4_2species_withEr_fullTrajectories_withQN |
211.358 | 1.889 | 0.01x | 1.310 | 0.01x | 3.087 | 0.01x | 2.314 | 0.01x | 98.8 | 295.1 | 2.99x | 384.2 | 3.89x | 0/251 (strict 0/251) | 0/251 (strict 0/251) | 9/9 | 9/9 | parity_ok | parity_ok |
geometryScheme5_3species_loRes |
98.976 | 1.615 | 0.02x | 1.074 | 0.01x | 3.691 | 0.04x | 2.908 | 0.03x | 129.6 | 352.8 | 2.72x | 363.3 | 2.80x | 0/193 (strict 0/193) | 0/193 (strict 0/193) | 9/9 | 9/9 | parity_ok | parity_ok |
inductiveE_noEr |
166.614 | 1.644 | 0.01x | 1.039 | 0.01x | 2.785 | 0.02x | 1.992 | 0.01x | 99.2 | 279.7 | 2.82x | 364.8 | 3.68x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
quick_2species_FPCollisions_noEr |
166.945 | 1.531 | 0.01x | 0.983 | 0.01x | 2.938 | 0.02x | 2.200 | 0.01x | 97.1 | 269.0 | 2.77x | 363.7 | 3.74x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories |
76.666 | 1.653 | 0.02x | 1.110 | 0.01x | 3.188 | 0.04x | 2.391 | 0.03x | 106.7 | 294.2 | 2.76x | 367.6 | 3.44x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories |
93.439 | 1.767 | 0.02x | 1.177 | 0.01x | 3.138 | 0.03x | 2.363 | 0.03x | 94.0 | 303.6 | 3.23x | 372.2 | 3.96x | 0/207 (strict 0/207) | 0/207 (strict 0/207) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_FPCollisions_noEr_withPhi1InDKE |
41.132 | 13.276 | 0.32x | 12.527 | 0.30x | 16.744 | 0.41x | 16.744 | 0.41x | 169.1 | 796.8 | 4.71x | 613.9 | 3.63x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_FPCollisions_noEr_withQN |
10.952 | 9.019 | 0.82x | 8.255 | 0.75x | 7.175 | 0.66x | 7.175 | 0.66x | 159.4 | 800.0 | 5.02x | 469.1 | 2.94x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr |
75.566 | 2.297 | 0.03x | 2.281 | 0.03x | 3.629 | 0.05x | 3.591 | 0.05x | 155.3 | 336.5 | 2.17x | 1094.1 | 7.05x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr_Nx1 |
75.533 | 1.339 | 0.02x | 1.330 | 0.02x | 2.728 | 0.04x | 2.718 | 0.04x | 119.2 | 278.2 | 2.33x | 1076.7 | 9.03x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_noEr_withQN |
75.242 | 5.147 | 0.07x | 5.147 | 0.07x | 11.444 | 0.15x | 10.019 | 0.13x | 165.0 | 612.3 | 3.71x | 459.5 | 2.78x | 0/274 (strict 0/274) | 0/274 (strict 0/274) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_1species_PASCollisions_withEr_fullTrajectories |
75.698 | 7.049 | 0.09x | 6.231 | 0.08x | 14.423 | 0.19x | 13.193 | 0.17x | 248.9 | 1319.5 | 5.30x | 1572.1 | 6.32x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 8/9 | 8/9 | parity_ok | parity_ok |
tokamak_2species_PASCollisions_noEr |
75.362 | 2.033 | 0.03x | 2.023 | 0.03x | 5.243 | 0.07x | 5.207 | 0.07x | 215.3 | 393.5 | 1.83x | 1168.7 | 5.43x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 9/9 | 9/9 | parity_ok | parity_ok |
tokamak_2species_PASCollisions_withEr_fullTrajectories |
76.530 | 9.435 | 0.12x | 8.669 | 0.11x | 23.369 | 0.31x | 22.264 | 0.29x | 386.6 | 1389.9 | 3.60x | 2007.0 | 5.19x | 0/212 (strict 0/212) | 0/212 (strict 0/212) | 8/9 | 8/9 | parity_ok | parity_ok |
Largest CPU runtime improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:
tokamak_2species_PASCollisions_noEr: 4.0s -> 2.0s (delta=2.0s)tokamak_1species_PASCollisions_noEr_Nx1: 2.4s -> 1.3s (delta=1.1s)quick_2species_FPCollisions_noEr: 2.1s -> 1.5s (delta=0.6s)sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_fullTrajectories: 2.3s -> 1.8s (delta=0.5s)sfincsPaperFigure3_geometryScheme11_FPCollisions_2Species_DKESTrajectories: 2.2s -> 1.7s (delta=0.5s)
Largest CPU process peak-RSS improvements vs tests/scaled_example_suite_recheck_cpu_frozen_2026-04-23_postkeyfix/suite_report.json:
tokamak_2species_PASCollisions_noEr: 2088.6 MB -> 584.8 MB (delta=1503.8 MB)tokamak_1species_PASCollisions_noEr: 612.9 MB -> 527.8 MB (delta=85.1 MB)tokamak_1species_PASCollisions_noEr_Nx1: 520.3 MB -> 464.3 MB (delta=56.0 MB)geometryScheme5_3species_loRes: 569.4 MB -> 540.3 MB (delta=29.1 MB)geometryScheme4_2species_withEr_fullTrajectories_withQN: 512.4 MB -> 486.4 MB (delta=26.0 MB)
Documentation
Build docs locally:
sphinx-build -b html -W docs docs/_build/html
Entry points:
docs/index.rstdocs/system_equations.rstdocs/method.rstdocs/normalizations.rstdocs/performance.rstdocs/parallelism.rst
Testing
pytest -q
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sfincs_jax-1.1.2.tar.gz.
File metadata
- Download URL: sfincs_jax-1.1.2.tar.gz
- Upload date:
- Size: 26.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c4cef3e9f0632f17d36c02ce0c2467de99cd0bd72bce63949429ead1f02bb26
|
|
| MD5 |
5922f25dd2035962624b155d6ccf9c7c
|
|
| BLAKE2b-256 |
202a17edcddbd75af7ba33242beaed48d4f627b57eced9a2306d20f15a6bb8b5
|
Provenance
The following attestation bundles were made for sfincs_jax-1.1.2.tar.gz:
Publisher:
publish-pypi.yml on uwplasma/sfincs_jax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfincs_jax-1.1.2.tar.gz -
Subject digest:
4c4cef3e9f0632f17d36c02ce0c2467de99cd0bd72bce63949429ead1f02bb26 - Sigstore transparency entry: 1495150742
- Sigstore integration time:
-
Permalink:
uwplasma/sfincs_jax@5dbe3517400d425010e2ff2aa715fdeb2856ee33 -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/uwplasma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5dbe3517400d425010e2ff2aa715fdeb2856ee33 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sfincs_jax-1.1.2-py3-none-any.whl.
File metadata
- Download URL: sfincs_jax-1.1.2-py3-none-any.whl
- Upload date:
- Size: 27.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a08c567849ea4acab9420d671fa2614d7038e83f9ad98f4acd6d2f4d163242e5
|
|
| MD5 |
d6593c1968197ba77f7dc66884eff4a0
|
|
| BLAKE2b-256 |
2b224e7eeba6a130b18818403505f4c09f90c6319bae95342aa7c4f453b7824b
|
Provenance
The following attestation bundles were made for sfincs_jax-1.1.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on uwplasma/sfincs_jax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sfincs_jax-1.1.2-py3-none-any.whl -
Subject digest:
a08c567849ea4acab9420d671fa2614d7038e83f9ad98f4acd6d2f4d163242e5 - Sigstore transparency entry: 1495150861
- Sigstore integration time:
-
Permalink:
uwplasma/sfincs_jax@5dbe3517400d425010e2ff2aa715fdeb2856ee33 -
Branch / Tag:
refs/tags/v1.1.2 - Owner: https://github.com/uwplasma
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5dbe3517400d425010e2ff2aa715fdeb2856ee33 -
Trigger Event:
push
-
Statement type: