Skip to main content

Profiler for MPI python programs

Project description

Enables:

  1. using cProfile on each launched MPI process.

  2. measuring the communication overhead of your MPI setup.

mpiprof provides two complementary pieces:

  1. A drop-in wrapper for cProfile that you can invoke as a module, mpiexec -n 4 python3 -m mpiprof your_script.py arg1 arg2. This writes a separate .pstats file for each rank.

  2. An optional MPIProfiler class that wraps a mpi4py.MPI.Comm and records basic timing/count statistics of MPI calls and where they came from. You can write the results to disk by calling write_statistics().

Installation

  • From pypi:

    pip install mpiprof
  • From a local checkout, by cloning this repository:

    git clone https://github.com/JohannesBuchner/mpiprof.py
    cd mpiprof.py/
    pip install .

Requirements

  • Python 3.8+

  • mpi4py

  • An MPI runtime if you plan to launch under mpiexec/srun/etc.

Usage: per-rank cProfile runner

Run your script under mpiexec (or srun) using the module form:

  • mpiexec -n 4 python3 -m mpiprof your_script.py arg1 arg2

This writes one profile file per rank named mpiprof.<rank>.pstats in the current working directory.

Options:

  • -o / --outfile sets the output path or pattern (default: mpiprof.{rank}.pstats). The literal substring {rank} will be replaced with the MPI rank.

Signal handling:

  • mpiprof installs SIGINT and SIGTERM handlers to dump the profile before exiting, so Ctrl-C or a clean termination should still produce output. If the MPI launcher escalates to SIGKILL immediately, no tool can save a profile.

Usage: MPIProfiler for mpi4py

The MPIProfiler wraps a communicator and measures wall-clock time and call counts for common blocking MPI calls. It is intentionally lightweight and safe to leave in production runs (low overhead on the measured calls). Nonblocking calls are counted but not timed precisely (the time recorded is call overhead, not the transfer time).

Example:

from mpi4py import MPI
from mpiprof import MPIProfiler

comm = MPIProfiler(MPI.COMM_WORLD)

# Your MPI code as usual, using comm instead of MPI.COMM_WORLD:
rank = comm.Get_rank()
size = comm.Get_size()
data = rank

# Simple collective
total = comm.allreduce(data, op=MPI.SUM)

# Point-to-point
if rank == 0:
    comm.send(b"hello", dest=1, tag=0)
elif rank == 1:
    msg = comm.recv(source=0, tag=0)

# Write stats at the end (one file per rank)
comm.write_statistics()  # default name: mpiprof.stats.<rank>.json

Notes:

  • The wrapper exposes common methods (e.g., send, recv, bcast, reduce, allreduce, gather, scatter, barrier) and forwards any other attributes to the underlying communicator. It tries to be case-insensitive to match mpi4py idioms (both Send and send are supported).

  • For nonblocking operations (Isend, Irecv), the wrapper records the call count but cannot attribute data transfer time unless you also wrap and time Wait/Waitall. A simple wait wrapper is provided to time individual requests returned by the wrapper.

Rank detection

The runner tries to detect the rank via common environment variables:

  • OMPI_COMM_WORLD_RANK, PMIX_RANK (Open MPI)

  • PMI_RANK (MPICH, Intel MPI)

  • MV2_COMM_WORLD_RANK (MVAPICH2)

  • SLURM_PROCID (scheduler fallback)

  • default: 0 if none found

Output files

  • Runner: mpiprof.<rank>.pstats (or the pattern you set with -o). You can analyze it with pstats or tools like snakeviz:

    • python3 -m pstats mpiprof.0.pstats

    • snakeviz mpiprof.0.pstats

  • MPIProfiler: MPIprofile.<rank>.out with operation counts and total wall-clock time per operation. Example:

    Function: scatter
    Call stack:
            surveymcmc.py:466
            surveymcmc.py:431 sampler.advance_one_step(False)
            mcaeis.py:107 local_coords = self.comm.scatter(chunks, root=0)
    Number of calls: 2
    Duration During Call: 98.990435s
    Duration Before Call: 0.000014s
    
    Function: scatter
    Call stack:
            surveymcmc.py:466
            surveymcmc.py:448 sampler.advance_one_step()
            mcaeis.py:75 self._init()
            mcaeis.py:55 self.log_probs = self._evaluate_log_probs(self.coords)
            mcaeis.py:64 local_coords = self.comm.scatter(chunks, root=0)
    Number of calls: 39
    Duration During Call: 2.228181s
    Duration Before Call: 0.000025s
    
    Total MPI Time: 388.922329s
    Total Non-MPI Time: 206.503627s

Results are sorted (descending) by duration during call.

Limitations

  • The runner cannot save profiles if the process is killed by SIGKILL.

  • MPIProfiler’s accounting for nonblocking calls is approximate unless you consistently call wait/waitall on the requests returned by the wrapper’s nonblocking methods.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mpiprof-0.2.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mpiprof-0.2.2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file mpiprof-0.2.2.tar.gz.

File metadata

  • Download URL: mpiprof-0.2.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mpiprof-0.2.2.tar.gz
Algorithm Hash digest
SHA256 14635221613a8684f0f7787e207912986f9976348952ee039678acd17f5a059a
MD5 f90b20d32864bf6f10c11e162eecfbe0
BLAKE2b-256 1a7163e976b61cf69b05a7c1b06038ef64619be0f388613adcaff82b9ca27851

See more details on using hashes here.

File details

Details for the file mpiprof-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: mpiprof-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mpiprof-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ef7dd1b3cff2361c35f2d0299fb54263e9d6e98dd3d0dce163b2fd8bcb3532df
MD5 28b15676af4dbe60476c1b4f5c8f7b2a
BLAKE2b-256 40db5b27e0823c1f39628b22f92214fdd6b2d49ae7b24e86eab6e3cd7a37e324

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page