Skip to main content

a command executor with caching for data processing pipelines

Project description

Razel

Rust MIT CI

Deno module Python module Rust crate

A command executor with caching. It is:

  • Fast: commands are executed multithreaded and local caching speeds up trial and error development (avoids repeated execution of commands which have been processed before)
  • Scalable: optional remote caching allows sharing results between CI jobs
  • Reliable: commands are executed in a sandbox to detect missing dependencies
  • Easy to use: commands are specified using a high-level TypeScript or Python API and convenience functions/tasks are built-in
  • Made for: data processing pipelines with executables working on files and many dependencies between those

Razel is not the best choice for building software, especially there's no built-in support for compiler setup and header dependencies.

Getting Started

The native input format for Razel is a razel.jsonl file, see the example examples/razel.jsonl. It can be run with razel exec -f examples/razel.jsonl.

The preferred way is to use one of the high-level APIs. Both allow specifying the commands in an object-oriented style and provide a run() function which creates the razel.jsonl file, downloads the native razel binary and uses it to execute the commands.

Paths of inputs files are relative to the workspace (directory of razel.jsonl). Output files are created in <cwd>/razel-out. Additional metadata is written to <cwd>/razel-out/razel-metadata.

TypeScript API

Install Deno to use the TypeScript API. Run the example Deno script:

deno run -A --check examples/deno.ts -- -v

Python API

The Python API requires Python >= 3.8. Install the package and run the example Python script:

pip install --upgrade razel
python examples/python.py -v

Batch file (experimental)

In addition to razel.jsonl, Razel can directly execute a batch file containing commands. Input and output files need to be specified, which is WIP.

Execute the example examples/batch.sh with Razel:

razel exec -f examples/batch.sh

Running in Docker/Podman container

The workspace directory can be mounted into a container:

podman run -t -v $PWD:$PWD -w $PWD denoland/deno deno run -A examples/deno.ts

Building Razel from source

Use rustup to install Rust. Install protobuf-compiler. Then run cargo install --locked razel.

Project Status

Razel is in active development and used in production.

CLI and format of razel.jsonl will likely change, same for output in razel-out/razel-metadata. While Linux is the main development platform, Razel is also tested on Mac and Windows.

Features

Measurements

Razel parses the stdout of executed commands to capture runtime measurements and writes them to razel-out/razel-metadata/log.json and razel-out/razel-metadata/measurements.csv. Currently, the <CTestMeasurement> and <DartMeasurement> tags as used by CTest/CDash are supported:

<CTestMeasurement type="numeric/double" name="score">12.3</CTestMeasurement>
<CTestMeasurement type="text/string" name="result">ok</CTestMeasurement>

Supporting custom formats is planned.

Tags

Tags can be set on commands. Any custom string can be used as tag, a colon should be used for grouping. The tags are added to razel-out/razel-metadata/execution_times.json. Using tags for filtering commands and creating reports is planned.

Tags with razel: prefix are reserved and have special meaning:

  • razel:quiet: don't be verbose if command succeeded
  • razel:verbose: always show verbose output
  • razel:condition: keep running and don't be verbose if command failed
  • razel:no-cache: always execute a command without caching
  • razel:no-remote-cache: don't use remote cache
  • razel:no-sandbox: disable sandbox and also cache - for commands with unspecified input/output files

Conditional execution / Skipping commands

Commands can be skipped based on the execution result of another command. Set the razel:condition tag on a command and use that one as dependency for other commands.

WebAssembly

Razel has a WebAssembly runtime integrated and can directly execute WASM modules using WebAssembly System Interface (WASI).

WebAssembly is a perfect fit to create portable data processing pipelines with Razel. Just a single WebAssembly module is needed to run - and create bit-exact output - on all platforms. WebAssembly execution is slower than native binaries, but startup time might be faster (no process overhead).

Param/Response files

Commands with huge number of arguments might result in command lines which are too long to be executed by the OS. Razel detects those cases and replaces the arguments with a response file. The filename starts with @.

Out of memory (OOM) handling

If a process is killed by the OS, the command and similar ones will be retried with less concurrency to reduce the total memory usage. (Doesn't work in K8s because the whole pod is killed.)

Sandbox

Commands are executed in a temporary directory which contains symlinks to the input files specific to one command. This allows detecting unspecified dependencies which would break caching.

The sandbox is not meant for executing untrusted code.

Local Caching

The local cache is enabled by default and stores information about previously executed commands and output files. The output directory razel-out contains symlinks to files stored in the local cache.

Use razel exec --info to get the default cache directory and --cache-dir (env: RAZEL_CACHE_DIR) to move it.

Remote Caching

Razel supports remote caching compatible to Bazel Remote Execution API. Remote execution is not yet implemented.

Use --remote-cache (env: RAZEL_REMOTE_CACHE) to specify a comma seperated list of remote cache URLs. The first available one will be used. Optionally --remote-cache-threshold (REMOTE_CACHE_THRESHOLD) can be set to only cache commands with outputSize / execTime < threshold [kilobyte / s]. If your remote cache doesn't have unlimited storage capacity, this can drastically speed up execution because quick commands with large output files will no longer be cached, providing more storage for expensive commands.

The following remote cache implementations are tested with Razel:

  • bazel-remote-cache
    • run with podman run -p 9092:9092 buchgr/bazel-remote-cache --max_size 10
    • call razel with RAZEL_REMOTE_CACHE=grpc://localhost:9092
  • nativelink
    • run with instance_name main on port 50051:
      mkdir -p nativelink-config
      curl https://raw.githubusercontent.com/TraceMachina/nativelink/main/nativelink-config/examples/basic_cas.json --output nativelink-config/basic_cas.json
      podman run -p 50051:50051 -v $PWD/nativelink-config:/nativelink-config:ro ghcr.io/tracemachina/nativelink:v0.2.0 /nativelink-config/basic_cas.json
      
    • call razel with RAZEL_REMOTE_CACHE=grpc://localhost:50051/main

Configuration

Use razel exec -h to list the configuration options for execution. Some options can also be set as environment variables and those are loaded from .env files.

The following sources are used in order, overwriting previous values:

  • .env file in current directory or its parents
  • .env.local file in current directory or its parents
  • environment variable
  • command line option

Acknowledgements

The idea to build fast and correct is based on Bazel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

razel-0.4.1.tar.gz (8.9 kB view hashes)

Uploaded Source

Built Distribution

razel-0.4.1-py3-none-any.whl (9.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page