Skip to main content

Run FireWorks workflows in Google Cloud

Project description

Borealis

Runs FireWorks workflows on Google Compute Engine (GCE).

See the repo Borealis.

  • Borealis is the git repo name.
  • borealis-fireworks is the PyPI package name.
  • borealis-fireworker.service is the name of the systemd service.
  • fireworker is the recommended process username and home directory name.

Background

You can launch as many Fireworker nodes as you want as Google Compute Engine (GCE) VM instances, and/or run local workers, as long as they can all connect to the LaunchPad server running MongoDB. Metadata parameters and the worker's gce_my_launchpad.yaml file (if that file doesn't exist, then my_launchpad.yaml) configure the MongoDB host, port, and DB name. Users can have their own DB names on a shared MongoDB server, and each user can have multiple DB names -- each an independent launchpad space for workflows and their Fireworker nodes.

Workers get Fireworks from the LaunchPad, run them in "rapidfire" mode, and eventually time out and shut themselves down.

Workers can run any Firetasks that are loaded on their disk images, but the best fit is to run the DockerTask Firetask. DockerTask pulls task input files from Google Cloud Storage (GCS), runs a payload task as a shell command within a Docker container, and pushes task output files to GCS.

DockerTask parameters include the Docker image to pull, the command shell tokens to run in the Docker container, and its input and output files and directories.

DockerTask pulls the inputs from and pushes the outputs to Google Cloud Storage (GCS). This avoids needing a shared NFS file service which costs 10x as much as GCS storage and doesn't scale as well.

Using a Docker image lets you bundle up the payload task with its entire runtime, e.g. Python version, pips, Linux apts, and config files. Your workflow can use one or more Docker images, and they're isolated from the Fireworker.

Team Setup

TODO: Install & configure dev tools, create a GCP project, auth stuff, install MongoDB on a GCE VM or set up Google-managed MongoDB, create a Fireworker disk image & image family, ...

Individual Developer Setup

TODO: Install & configure dev tools, make a storage bucket with a globally-unique name, build a Docker image to run, ...

Run

TODO

Change Log

v0.4.0

  • DockerTask:
    • Implement task timeouts.
    • Log the elapsed runtime of the container process.
    • Timestamp the log filename also in the "Pushing outputs to GCS" message and in the local filename (prep for caching output files locally).
    • Raise an exception if a > or >> capture path parameter names a directory.
  • Fireworker:
    • Allow the worker's my_launchpad.yaml file to set the idle_for_waiters and idle_for_rockets parameters. This is good for configuring GCE workers in the Disk Image and off-GCE local workers in the local yaml file.
    • Add a quit=soon feature. gce.py can set this metadata attribute to ask Fireworkers to quit gracefully between rockets.

v0.3.3

  • Timestamp the captured log files to keep them all from multiple runs and so ls -l sorts in time order.

v0.3.2 - 2020-02-17

  • Add info to the logs.

v0.3.1 - 2020-02-17

  • Python 2 compatibility fixes.
  • Explain the ConnectionError that arises when fireworker can't contact the Docker server.

v0.3.0 - 2020-02-14

  • Move the setup files from borealis/installation/ to borealis/setup/.
  • Add a fireworker --setup option to print the setup path to simplify the steps to copy those files when setting up a server Disk Image.
  • Add a fireworker -l <launchpad_filename> option for compatibility with lpad. The default is back to my_launchpad.yaml.
  • Add a gce -l <launchpad_filename> option, like lpad, to read the db name, username, and password when creating VMs. The default is my_launchpad.yaml.

v0.2.1 - 2020-02-13

  • Bug fix in the gce_my_launchpad.yaml fallback code.

v0.2.0 - 2020-02-13

  • Read launchpad config info from gce_my_launchpad.yaml if possible, falling back to my_launchpad.yaml for compatibility with previous releases. This lets people use one launchpad config file for their GCE workflows and another one for their other workflows.
  • Improve the server installation steps and augment the fireworker --help text to display its directory.

v0.1.1 - 2020-02-13

  • Correct the pip name in startup.sh.
  • Use print() instead of logging in gce.py so the messages aren't filtered by the log level.
  • Refine the installation instructions.

v0.1.0 - 2020-02-10

  • Initial dev build.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

borealis-fireworks-0.4.0.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

borealis_fireworks-0.4.0-py2.py3-none-any.whl (32.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file borealis-fireworks-0.4.0.tar.gz.

File metadata

  • Download URL: borealis-fireworks-0.4.0.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.0

File hashes

Hashes for borealis-fireworks-0.4.0.tar.gz
Algorithm Hash digest
SHA256 fd5927f158b6927eaaf20fbe1dcaebe64e563a1ffb0a5cdec1006250287e17bd
MD5 c0ce45a995cc5299dbaedccc8c3934ae
BLAKE2b-256 0c6a9997397e04d899475faf14eda7ad996b39d212a66835c68b5e8140db2b6a

See more details on using hashes here.

File details

Details for the file borealis_fireworks-0.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: borealis_fireworks-0.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.0

File hashes

Hashes for borealis_fireworks-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6db318b5898936b73a5005fc33b17b5e42ce4dc44aa9ea06bff7c552dff2eefe
MD5 39e722a9f6481738c68b7fadedf08903
BLAKE2b-256 cba0b6ac845293a3fd27c995b83624f5957b9ad51ad6591783faf1f5a0168e84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page