Skip to main content

The building blocks of workflows!

Project description

Python versions License Activity Issues Pull requests

Merlin

A brief introduction to Merlin

Merlin is a tool for running machine learning based workflows. The goal of Merlin is to make it easy to build, run, and process the kinds of large scale HPC workflows needed for cognitive simulation.

At its heart, Merlin is a distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations (we've done 100 Million on the Sierra Supercomputer).

Why would you want to run that many simulations? To become your own Big Data generator.

Data sets of this size can be large enough to train deep neural networks that can mimic your HPC application, to be used for such things as design optimization, uncertainty quantification and statistical experimental inference. Merlin's been used to study inertial confinement fusion, extreme ultraviolet light generation, structural mechanics and atomic physics, to name a few.

How does it work?

In essence, Merlin coordinates complex workflows through a persistent external queue server that lives outside of your HPC systems, but that can talk to nodes on your cluster(s). As jobs spin up across your ecosystem, workers on those allocations pull work from a central server, which coordinates the task dependencies for your workflow. Since this coordination is done via direct connections to the workers (i.e. not through a file system), your workflow can scale to very large numbers of workers, which means a very large number of simulations with very little overhead.

Furthermore, since the workers pull their instructions from the central server, you can do a lot of other neat things, like having multiple batch allocations contribute to the same work (think surge computing), or specialize workers to different machines (think CPU workers for your application and GPU workers that train your neural network). Another neat feature is that these workers can add more work back to central server, which enables a variety of dynamic workflows, such as may be necessary for the intelligent sampling of design spaces or reinforcement learning tasks.

Merlin does all of this by leveraging some key HPC and cloud computing technologies, building off open source components. It uses maestro to provide an interface for describing workflows, as well as for defining workflow task dependencies. It translates those dependencies into concrete tasks via celery, which can be configured for a variety of backend technologies (rabbitmq and redis are currently supported). Although not a hard dependency, we encourage the use of flux for interfacing with HPC batch systems, since it can scale to a very large number of jobs.

The integrated system looks a little something like this:

A Typical Merlin Workflow

In this example, here's how it all works:

  1. The scientist describes her HPC workflow as a maestro DAG (directed acyclic graph) "spec" file workflow.yaml
  2. She then sends it to the persistent server with merlin run workflow.yaml . Merlin translates the file into tasks.
  3. The scientist submits a job request to her HPC center. These jobs ask for workers via the command merlin run-workers workflow.yaml.
  4. Coffee break.
  5. As jobs stand up, they pull work from the queue, making calls to flux to get the necessary HPC resources.
  6. Later, workers on a different allocation, with GPU resources connect to the server and contribute to processing the workload.

The central queue server deals with task dependencies and keeps the workers fed.

For more details, check out the rest of the documentation.

Need help? merlin@llnl.gov

Quick Start

Note: Merlin supports Python 3.8+.

To install Merlin and its dependencies, run:

$ pip3 install merlin

Create your application config file:

$ merlin config create

Open the newly created config file at ~/.merlin/app.yaml and edit it to point to a RabbitMQ/Redis server. More instructions on this can be found on the Configuration page of Merlin's docs.

That's it.

To run something a little more like what you're interested in, namely a demo workflow that has simulation and machine learning, first generate an example workflow:

$ merlin example feature_demo

Then install the workflow's dependencies:

$ pip install -r feature_demo/requirements.txt

Then process the workflow and create tasks on the server:

$ merlin run feature_demo/feature_demo.yaml

And finally, launch workers that can process those tasks:

$ merlin run-workers feature_demo/feature_demo.yaml

Documentation

Full documentation is available, or run:

$ merlin --help

(or add --help to the end of any sub-command you want to learn more about.)

Code of Conduct

Please note that Merlin has a Code of Conduct. By participating in the Merlin community, you agree to abide by its rules.

License

Merlin is distributed under the terms of the MIT LICENSE.

LLNL-CODE-797170

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin-1.13.0.tar.gz (285.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merlin-1.13.0-py3-none-any.whl (377.0 kB view details)

Uploaded Python 3

File details

Details for the file merlin-1.13.0.tar.gz.

File metadata

  • Download URL: merlin-1.13.0.tar.gz
  • Upload date:
  • Size: 285.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for merlin-1.13.0.tar.gz
Algorithm Hash digest
SHA256 ddfbed8446fc3108001786dbb9b466e7b19028396265a8f0e84a702424e90101
MD5 45751e7cc39faf8edf3e8b2fb7fd3220
BLAKE2b-256 b1bb46e35daebbc25f4a13b68f05031b5d69def82b8cd2e7b5f60a23c2ab1684

See more details on using hashes here.

File details

Details for the file merlin-1.13.0-py3-none-any.whl.

File metadata

  • Download URL: merlin-1.13.0-py3-none-any.whl
  • Upload date:
  • Size: 377.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for merlin-1.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed04bef652e7b5db09fe6b62a5614b809144a318880d7369fbd92b9659fa6fc7
MD5 b29a19bda7104ec5d95decbf8e7c3b50
BLAKE2b-256 5d6b114c9267da4b1f7d4e35e87a2cd4bcded503984532114f10df866f6554fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page