Skip to main content

Numerical tools for MPI-parallelized code

Project description

NuMPI

NuMPI is a collection of numerical tools for MPI-parallelized Python codes. NuMPI presently contains:

  • An (incomplete) stub implementation of the mpi4py interface to the MPI libraries. This allows running serial versions of MPI parallel code without having mpi4py (and hence a full MPI stack) installed.
  • Parallel file IO in numpy's .npy format using MPI I/O.
  • MPI-parallel L-BFGS optimizers:
    • l_bfgs — unconstrained, with a strong-Wolfe line search.
    • l_bfgs_bounded — box-constrained (lo <= x <= hi) with optional index pinning, two-loop recursion and projected Armijo backtracking.
    • l_bfgs_projected — a single linear equality <a, x> = target plus optional box bounds.
  • An MPI-parallel bound constrained conjugate gradients algorithm.

Build status

Tests Flake8

Installation

python3 -m pip install NuMPI

Development Installation

Clone the repository.

To use the code, install the current package as editable:

pip install -e .[test]

Testing

You have to do a development installation to be able to run the tests.

We use runtests.

From the main installation directory:

python run-tests.py

If you want to use NuMPI without mpi4py, you can simply run the tests with pytest.

pytest tests/

Testing on the cluster

On NEMO for example

msub -q express -l walltime=15:00,nodes=1:ppn=20 NEMO_test_job.sh -m bea

MPI Conventions

All of NuMPI's parallel algorithms operate on distributed arrays: each MPI rank holds a slice of the global data, and scalar quantities (energies, norms, convergence tolerances, Lagrange multipliers) are globally reduced — the same value on every rank. Understanding the split between local and global is essential to using the optimizers correctly; this section spells it out.

Distributed vs. global

Quantity Lives where
Iterate x, gradient grad, initial guess x0 local — each rank's own slice
Bounds bounds_lo, bounds_hi, zero_mask local — sliced to match x
LinearConstraint.a (weight vector) local
Scalar energy f(x) global (reduced)
LinearConstraint.target (right-hand side) global (same on every rank)
Lagrange multiplier, convergence tolerance, gtol, ftol global
callback(x) argument local slice of current iterate

User-supplied callbacks

The solvers call back into user code in a few places; each has a specific contract.

  • Objective fun(x) -> (energy, gradient) (when jac=True) or separate fun(x) -> energy and jac(x) -> gradient:

    • energy must be a globally reduced scalar. All ranks must return the same number. The standard way to do this is to compute a local quantity and reduce it with pnp.sum(...).item() (or equivalent), where pnp is the Reduction(comm) wrapper. Returning a local energy is the single most common MPI mistake: ranks will silently disagree in line-search acceptance tests and the optimisation will diverge or hang.
    • gradient is local — only the current rank's slice.
  • callback(x) receives the current local iterate. If the caller needs the global state (for plotting or logging from rank 0), they must gather explicitly.

  • hessp(x, d) (CG) returns a local Hessian-vector product.

Building distributed inputs

Use NuMPI.Tools.Reduction(comm) to obtain a pnp object whose sum, max, min, mean, dot methods perform MPI_Allreduce across the communicator. When mpi4py is not installed, NuMPI.MPIStub provides the same interface with a single "rank", so the same code runs serially too.

A typical setup with a communicator-provided subdomain looks like:

from NuMPI.Tools import Reduction
from NuMPI.Optimization import LinearConstraint, l_bfgs_projected

pnp = Reduction(comm)

# a_local: this rank's slice of the global weight vector, shape matching x
# target: global scalar, same on every rank
lc = LinearConstraint(a_local, target, pnp=pnp)

def fun(x):                      # x is the local slice
    # compute local integrand, then REDUCE for the scalar return
    local_energy = 0.5 * np.sum((x - y_local) ** 2)
    return pnp.sum(local_energy).item(), (x - y_local)   # gradient stays local

res = l_bfgs_projected(fun, x0_local, lc, jac=True,
                       bounds_lo=0.0, bounds_hi=1.0,
                       comm=comm, gtol=1e-5)

The returned res.x is the local slice of the solution; res.fun, res.multiplier, and res.max_grad are globally reduced scalars.

See NuMPI/Optimization/__init__.py for optimizer-specific notes and test/Optimization/MPIMinimizationProblems.py::MPI_Quadratic for a reference implementation of a distributed objective.

Development & Funding

Development of this project is funded by the European Research Council within Starting Grant 757343 and by the Deutsche Forschungsgemeinschaft within project EXC 2193.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numpi-0.11.0.tar.gz (103.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

numpi-0.11.0-py3-none-any.whl (59.4 kB view details)

Uploaded Python 3

File details

Details for the file numpi-0.11.0.tar.gz.

File metadata

  • Download URL: numpi-0.11.0.tar.gz
  • Upload date:
  • Size: 103.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for numpi-0.11.0.tar.gz
Algorithm Hash digest
SHA256 d8559ef502385fc4f58f8c9ff1588925136998b8732b37a9ca621f64fbdfc69b
MD5 198f622d74e836d046b15a483e94c2ea
BLAKE2b-256 0449b056434118755a6f10f67dd733d48944f97a276b89aa7ded17c8f1fe6bb2

See more details on using hashes here.

File details

Details for the file numpi-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: numpi-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 59.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for numpi-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e99847b454f92fa14160ed972db2cc15e290acb592a049f3543e4f4675f61fa
MD5 d9df725a0c54c6861434fdd84bf2b811
BLAKE2b-256 85d2661fdc0b4bd0923667d493949315e62fefde26c11921d91da3568beda523

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page