Skip to main content

Run experiments and interactively plot results across combinations of user-specified dimensions

Project description

Yuclid

Combinatorially explode your experiments

Yuclid is a tool for orchestrating experiments in N-dimensional irregular spaces of parameters. It collects custom metrics in a single JSON file for easy post-processing. Yuclid builds the Cartesian product of the dimensions you defined, and runs an experiment per point in that space. It also provides a unique way of plotting data (yuclid plot) interactively, browsing slices of the results using the arrow keys.

The geometrical metaphor is that each experiment is a point in a multidimensional discrete space formed by all combinations of user-defined parameters. By specifying extra conditions (see advanced example), some hyper regions can be carved out of the original space.

What kind of experiments?

Anything that can be expressed in a single (pipelined) command that generates one or more numbers. Since programs' outputs are often verbose and the target metric is contained in a single line, metrics can be arbitrarily defined in terms of other commands, e.g., regular expressions (see example).

Here's a list of use-case ideas:

  • Measure the impact of different optimization levels of different compilers on different programs
  • Count cache misses under different memory allocators on different inputs
  • Measure strong scaling parallel programs given different thread affinities
  • Evaluate different compression algorithms on different inputs with different compression levels
  • Organize perf counters alongside custom metrics e.g., max RSS, in a self-contained JSON file
  • Create reproducible artifacts for research software
  • All of the above combined!

Installation

Requires python >= 3.8

Development head:

pip install git+https://github.com/fsossai/yuclid.git

Stable release:

pip install yuclid
  • yuclid run: Run experiments with all combinations of the defined parameters.
  • yuclid plot: Interactively visualizes the results produced by yuclid run.

Configuration for yuclid run

Key sections of yuclid.json:

  • env: Environment variables and constants
  • setup: Commands to run before experiments (global) or for specific parameter combinations (point)
  • trials: The actual experiment commands that generate metrics to collect
  • metrics: How to extract a given metric from the data collected by the trials
  • space: Dimension definitions - all combinations will be explored
  • order: Execution order of parameter combinations

Parameters can be simple lists or objects with name/value pairs. Use ${yuclid.x} in a command to reference the value of dimension x, and ${yuclid.@} for a unique output filename. ${yuclid.x} is an alias for ${yuclid.x.value}.

Minimal Example

Suppose you want to time a compression algorithm on different input sizes and also measure the execution time variance across cores. The dimensions of this experiment, i.e., the space, would be the size of the input, the compression level and the cpuid. Yuclid uses a yuclid.json configuration file to define the space and other experiment parameters. Here's a minimal example that you can immediately run on your linux terminal.

{
  "space": {
    "size": [
      {
        "name": "small",
        "value": "10M"
      },
      {
        "name": "medium",
        "value": "20M"
      },
      {
        "name": "large",
        "value": "50M"
      }
    ],
    "cpuid": [0, 1, 2, 3],
    "compression": [
      {
        "name": "lowest",
        "value": 1
      },
      {
        "name": "highest",
        "value": 9
      }
    ]
  },
  "trials": [
    "time -p taskset -c ${yuclid.cpuid} head -c ${yuclid.size} /dev/urandom | gzip -${yuclid.compression} >/dev/null"
  ],
  "metrics": [
    {
      "name": "time.real",
      "command": "cat ${yuclid.@}.err | grep real | grep -oE '[0-9]+\\.[0-9]+'"
    },
    {
      "name": "time.sys",
      "command": "cat ${yuclid.@}.err | grep sys | grep -oE '[0-9]+\\.[0-9]+'"
    }
  ]
}

To run the experiments, copy the configuration above into yuclid.json and from the same directory run

yuclid run

You can also run a subspace using the selector -s

yuclid run -s size=medium
yuclid run -s cpuid=0,1,2
yuclid run -s size=small,medium cpuid=3,0

The command yuclid run (or yuclid run --inputs yuclid.json) will produce a JSON Lines:

{"size": "small", "cpuid": "0", "compression": "lowest", "time.real": 0.37, "time.sys": 0.05}
{"size": "small", "cpuid": "0", "compression": "highest", "time.real": 0.33, "time.sys": 0.05}
{"size": "small", "cpuid": "1", "compression": "lowest", "time.real": 0.31, "time.sys": 0.05}
{"size": "small", "cpuid": "1", "compression": "highest", "time.real": 0.33, "time.sys": 0.05}
{"size": "small", "cpuid": "2", "compression": "lowest", "time.real": 0.31, "time.sys": 0.05}
{"size": "small", "cpuid": "2", "compression": "highest", "time.real": 0.32, "time.sys": 0.05}
{"size": "small", "cpuid": "3", "compression": "lowest", "time.real": 0.31, "time.sys": 0.05}
{"size": "small", "cpuid": "3", "compression": "highest", "time.real": 0.33, "time.sys": 0.05}
{"size": "medium", "cpuid": "0", "compression": "lowest", "time.real": 0.62, "time.sys": 0.11}
{"size": "medium", "cpuid": "0", "compression": "highest", "time.real": 0.66, "time.sys": 0.1}
{"size": "medium", "cpuid": "1", "compression": "lowest", "time.real": 0.62, "time.sys": 0.11}
{"size": "medium", "cpuid": "1", "compression": "highest", "time.real": 0.66, "time.sys": 0.1}
{"size": "medium", "cpuid": "2", "compression": "lowest", "time.real": 0.64, "time.sys": 0.11}
{"size": "medium", "cpuid": "2", "compression": "highest", "time.real": 0.65, "time.sys": 0.1}
{"size": "medium", "cpuid": "3", "compression": "lowest", "time.real": 0.67, "time.sys": 0.11}
{"size": "medium", "cpuid": "3", "compression": "highest", "time.real": 0.67, "time.sys": 0.11}
{"size": "large", "cpuid": "0", "compression": "lowest", "time.real": 1.59, "time.sys": 0.27}
{"size": "large", "cpuid": "0", "compression": "highest", "time.real": 1.58, "time.sys": 0.26}
{"size": "large", "cpuid": "1", "compression": "lowest", "time.real": 1.59, "time.sys": 0.28}
{"size": "large", "cpuid": "1", "compression": "highest", "time.real": 1.6, "time.sys": 0.27}
{"size": "large", "cpuid": "2", "compression": "lowest", "time.real": 1.54, "time.sys": 0.38}
{"size": "large", "cpuid": "2", "compression": "highest", "time.real": 1.69, "time.sys": 0.26}
{"size": "large", "cpuid": "3", "compression": "lowest", "time.real": 1.54, "time.sys": 0.27}
{"size": "large", "cpuid": "3", "compression": "highest", "time.real": 1.59, "time.sys": 0.27}

These above results can be displayed with yuclid plot, e.g.:

yuclid plot results.json -x compression
yuclid plot results.json -x size -z cpuid

Interact with the plot using arrow keys to move around dimensions and number keys to change the metric!

Or entirely in the terminal with yuclid tplot:

yuclid tplot results.jsonl -x size -z compression -y time.real -A
                                    [1] time.real                               
    ┌──────────────────────────────────────────────────────────────────────────┐
1.59┤###########  @@@@@@@@@@@                             ## highest           │
    │###########  @@@@@@@@@@@                              @@ lowest           │
    │###########  @@@@@@@@@@@                                                  │
1.33┤###########  @@@@@@@@@@@                                                  │
    │###########  @@@@@@@@@@@                                                  │
1.06┤###########  @@@@@@@@@@@                                                  │
    │###########  @@@@@@@@@@@                                                  │
    │###########  @@@@@@@@@@@                                                  │
0.80┤###########  @@@@@@@@@@@ ###########                                      │
    │###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@                         │
0.53┤###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@                         │
    │###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@              @@@@@@@@@@@│
0.27┤###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@│
    │###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@│
0.00┤###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@ ###########  @@@@@@@@@@@│
    └───────────┬─────────────────────────┬────────────────────────┬───────────┘
              large                    medium                    small          
time.real                               size

        large   medium  small
highest 1.58    0.66    0.33
lowest  1.59    0.62    0.37

Advanced Example

The following is a template showing how to track metrics of a program compiled with different compilers, running with a different number of threads and customize the input based on how many threads are used.

{
  "env": {
    "root": "/my/path",
    "data_dir": "/path/to/data"
  },
  "setup": {
    "global": [
      "ulimit -s 1048576" // global commands are run before point commands
    ],
    "point": [
      {
        "on": [ "compiler" ], // run the command on these dimensions only.
                              // The entire space is assumed if empty.
        "command": "mkdir -p ${yuclid.compiler}",
        "parallel": [ "compiler" ] // list|true|false: can execute more commands in parallel
                                   // true = all dimensions in `on`.
      },
      {
        "on": [ "compiler" ], // run the command on these dimensions only.
                              // The entire space is assumed if empty.
        "command": "make myprogram.out CXX=${yuclid.compiler} OUTDIR=$root/build/${yuclid.compiler}",
        "parallel": true // equivalent to ["compiler"]
      }
    ]
  },
  "space": {
    "compiler": [ "g++", "clang++" ],
    "threads": [ 1, 2, 3, 4 ],
    // or
    "threads:py": "list(range(1,5))", // python!
    // or
    "nthreads": null, // this forces the user to specify nthreads from CLI
                      // e.g. --select nthreads=1,7,14
    "dataset": [
      {
        "name": "small",
        "value": "${data_dir}/mydatasetA.dat",
        "condition": "yuclid.nthreads == 1"
      },
      {
        "name": "small", // name can be duplicated
        "value": "${data_dir}/mydatasetB.dat",
        "condition": "yuclid.nthreads > 1"
      }
    ]
  },
  "trials": [
    {
      "command": "time -p ${yuclid.compiler}/myprogram.out ${yuclid.dataset}",
      "metrics": [ "time", "something_else" ] // which metrics this command enables
                                              // "condition": "True" can specify extra conditions
    }
  ],
  "metrics": [
    {
      "name": "time",
      // each metric command must generate one or more numbers (separated by space or linebreak)
      // ${yuclid.@} represents a unique trial identifier
      // ${yuclid.@}.out and ${yuclid.@}.err are automatically generated for each trial
      "command": "cat ${yuclid.@}.err | grep real | grep -E '[0-9]+\\.[0-9]+'"
    },
    {
      "name": "something_else",
      "command": "cat ${yuclid.@}.out | grep something"
    }
  ],
  "order": [ "compiler", "dataset", "nthreads" ]  // different nthreads first,
                                                  // then datasets, then compilers
}

Plot API

yuclid plot can be used directly on your pyplot canvas. The command yuclid plot results.json -x size -z cpuid can be emulated in a more customizable script, e.g.:

import yuclid.plot
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

# just like the CLI
cli_args = [
  "results.json",
  "-x",
  "size",
  "-z",
  "cpuid"
]
df = yuclid.plot.draw(fig, ax, cli_args)
plt.show()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yuclid-0.2.3.tar.gz (39.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yuclid-0.2.3-py3-none-any.whl (38.4 kB view details)

Uploaded Python 3

File details

Details for the file yuclid-0.2.3.tar.gz.

File metadata

  • Download URL: yuclid-0.2.3.tar.gz
  • Upload date:
  • Size: 39.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for yuclid-0.2.3.tar.gz
Algorithm Hash digest
SHA256 1290173ea60040fb868a388fe0f66150ef6baec7a7050ed611cd25f8f6c86933
MD5 d4612b2178cbd75366e7a26a088f7ca5
BLAKE2b-256 a1910931117d7c26c97cc7ca5e4d4b6d05679be5a00beb966ce0829093bad864

See more details on using hashes here.

File details

Details for the file yuclid-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: yuclid-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 38.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for yuclid-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e4b7419090b1afbcb96f589492de52f6b6af30149673adb07fd40e19f58054e1
MD5 7f4b2fe0ad5df64bf2208a9c98272281
BLAKE2b-256 32c032dad2706e480a9b652ba8832ae22a87783778d15596f87a4584a47bb088

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page