Numerical tools for MPI-parallelized code
Project description
NuMPI
NuMPI is a collection of numerical tools for MPI-parallelized Python codes. NuMPI presently contains:
- An (incomplete) stub implementation of the mpi4py interface to the MPI libraries. This allows running serial versions of MPI parallel code without having
mpi4py(and hence a full MPI stack) installed. - Parallel file IO in numpy's .npy format using MPI I/O.
- MPI-parallel L-BFGS optimizers:
l_bfgs— unconstrained, with a strong-Wolfe line search.l_bfgs_bounded— box-constrained (lo <= x <= hi) with optional index pinning, two-loop recursion and projected Armijo backtracking.l_bfgs_projected— a single linear equality<a, x> = targetplus optional box bounds.
- An MPI-parallel bound constrained conjugate gradients algorithm.
Build status
Installation
python3 -m pip install NuMPI
Development Installation
Clone the repository.
To use the code, install the current package as editable:
pip install -e .[test]
Testing
You have to do a development installation to be able to run the tests.
We use runtests.
From the main installation directory:
python run-tests.py
If you want to use NuMPI without mpi4py, you can simply run the tests with pytest.
pytest tests/
Testing on the cluster
On NEMO for example
msub -q express -l walltime=15:00,nodes=1:ppn=20 NEMO_test_job.sh -m bea
MPI Conventions
All of NuMPI's parallel algorithms operate on distributed arrays: each MPI rank holds a slice of the global data, and scalar quantities (energies, norms, convergence tolerances, Lagrange multipliers) are globally reduced — the same value on every rank. Understanding the split between local and global is essential to using the optimizers correctly; this section spells it out.
Distributed vs. global
| Quantity | Lives where |
|---|---|
Iterate x, gradient grad, initial guess x0 |
local — each rank's own slice |
Bounds bounds_lo, bounds_hi, zero_mask |
local — sliced to match x |
LinearConstraint.a (weight vector) |
local |
Scalar energy f(x) |
global (reduced) |
LinearConstraint.target (right-hand side) |
global (same on every rank) |
Lagrange multiplier, convergence tolerance, gtol, ftol |
global |
callback(x) argument |
local slice of current iterate |
User-supplied callbacks
The solvers call back into user code in a few places; each has a specific contract.
-
Objective
fun(x) -> (energy, gradient)(whenjac=True) or separatefun(x) -> energyandjac(x) -> gradient:energymust be a globally reduced scalar. All ranks must return the same number. The standard way to do this is to compute a local quantity and reduce it withpnp.sum(...).item()(or equivalent), wherepnpis theReduction(comm)wrapper. Returning a local energy is the single most common MPI mistake: ranks will silently disagree in line-search acceptance tests and the optimisation will diverge or hang.gradientis local — only the current rank's slice.
-
callback(x)receives the current local iterate. If the caller needs the global state (for plotting or logging from rank 0), they must gather explicitly. -
hessp(x, d)(CG) returns a local Hessian-vector product.
Building distributed inputs
Use NuMPI.Tools.Reduction(comm) to obtain a pnp object whose sum, max,
min, mean, dot methods perform MPI_Allreduce across the communicator.
When mpi4py is not installed, NuMPI.MPIStub provides the same interface
with a single "rank", so the same code runs serially too.
A typical setup with a communicator-provided subdomain looks like:
from NuMPI.Tools import Reduction
from NuMPI.Optimization import LinearConstraint, l_bfgs_projected
pnp = Reduction(comm)
# a_local: this rank's slice of the global weight vector, shape matching x
# target: global scalar, same on every rank
lc = LinearConstraint(a_local, target, pnp=pnp)
def fun(x): # x is the local slice
# compute local integrand, then REDUCE for the scalar return
local_energy = 0.5 * np.sum((x - y_local) ** 2)
return pnp.sum(local_energy).item(), (x - y_local) # gradient stays local
res = l_bfgs_projected(fun, x0_local, lc, jac=True,
bounds_lo=0.0, bounds_hi=1.0,
comm=comm, gtol=1e-5)
The returned res.x is the local slice of the solution; res.fun,
res.multiplier, and res.max_grad are globally reduced scalars.
See NuMPI/Optimization/__init__.py for optimizer-specific notes and
test/Optimization/MPIMinimizationProblems.py::MPI_Quadratic for a
reference implementation of a distributed objective.
Development & Funding
Development of this project is funded by the European Research Council within Starting Grant 757343 and by the Deutsche Forschungsgemeinschaft within project EXC 2193.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file numpi-0.10.0.tar.gz.
File metadata
- Download URL: numpi-0.10.0.tar.gz
- Upload date:
- Size: 102.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49e5acb9a8243a6f4ab4fbed9fba87198b4e6121aa8d60334bf8e9e685aac0f9
|
|
| MD5 |
007a27dc70a8ad99bc4eb070bee234ae
|
|
| BLAKE2b-256 |
cc20dc580341541957ad6f2aed504ba218a3df74c58ae0db97bb440c9f2f3a3b
|
File details
Details for the file numpi-0.10.0-py3-none-any.whl.
File metadata
- Download URL: numpi-0.10.0-py3-none-any.whl
- Upload date:
- Size: 58.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11f11dbb5d59822b4e36cd038b66641d0a84bcd97c42694d600352b8c72218c2
|
|
| MD5 |
fcc767bd46da06d4df4cbf2901aeed0d
|
|
| BLAKE2b-256 |
fc9d087ccac43e515b9611f1d989a4380002a7fc65078cf877d0e96b0b6c1634
|