This package contains the Job-Runner Worker, which is responsible for executing
the scheduled jobs managed by the Job-Runner.
Requirements (depending on your distro, the naming might be a bit different):
Then you should be able to install this package with
pip install job-runner-worker.
If you want to install this package in development mode, clone this repository
and then execute python setup.py develop. In the latter, you might want
to install the testing requirements by executing
pip install -r test-requirements.txt.
See the getting started section in the Job-Runner documentation (
in the job-runner repo) for setting up the whole project.
Example with required settings:
All available settings
- The base URL which will be used to access the API. This should start with
http:// or https://.
- Public-key to access the API.
- Private-key to access the API.
- The number of jobs to run concurrently. Default: 4.
The log level. Default: 'info'. Valid options are:
- The maximum number of bytes of the log that is sent back to the API. This
is to avoid 413 Request Entity Too Large errors. If the log will be
larger than this value, 20% of the allowed size will be taken from the top
of the log, the remaining 80% will be taken from the bottom. Everything
in between will be truncated. Default: 819200 (800kb).
- The hostname of the WebSocket Server.
- The port of the WebSocket Server. Default: 5555.
- The path where the scripts that are being executed through the Job-Runner
are temporarily stored. Default: '/tmp'.
- The hostname of the queue broadcaster server.
- The port of the queue broadcaster server. Default: 5556.
- Seconds after which the subscriber is re-connecting to the publisher
when no data has been received. Default: 300. This is useful when you
are loadbalancing the publisher and it keeps the TCP connection open on the
front-end, when the connection on the back-end has been closed. Because of
this ZMQ doesn’t detect that it is not connected anymore and jobs get
For starting the worker, you can use the job_runner_worker command:
usage: job_runner_worker [-h] [--config-path CONFIG_PATH]
Job Runner worker (v2.1.0)
-h, --help show this help message and exit
absolute path to config file (default: CONFIG_PATH env
- Rollback retry on 4xx errors. Instead, recover when an unexpected error
occurs in the execute_run, enqueue_actions, or kill_run. This
will recover from when a run was claimed by two workers (e.g. in the case
when it was sent to worker a, which doesn’t respond directly, then it was
sent to worker b which claims it after which a claims it too).
- Make sure a shebang does exist on scripts to be run. Use shlex to make
- Retry request 5x when the response is in the 4xx range before raising an
- On ping response, send back the version of the worker and the number of
concurrent jobs. This version requires that you have job-runner>=3.4.0
- Update error message when job does not start to be more verbose and specific.
- Fix the case where in case of an exception, the run was marked as completed
but not started.
- Make sure to only cleanup runs that are assigned to the worker. This version
is dependent on job-runner>=3.0.1.
- Make the worker compatible with the new worker-pool structure.
IMPORTANT: This version is dependent on job-runner>=2.0.0!
- Change SETTINGS_PATH environment variable to CONFIG_PATH for better
- Make sure that when a run already has log, it is updated (before it would
hang on the database integrity error).
- Make the worker crash early instead of hanging on errors happening before the
actual job starts, to give the user a visible cue that something went wrong.
- The worker will now terminate gracefully when receiving the TERM signal.
This means that all pending jobs will be completed, but that it will not
accept any new jobs. After finishing the last pending job, the worker will
- Set reconnect_after_inactivity default to 10 minutes. This is 2 x the
JOB_RUNNER_WORKER_PING_INTERVAL default setting in Job-Runner.
- Implement handler for ping action.
- Add and implement reconnect_after_inactivity setting.
- Run script by finding their shebang without the x bit being needed.
- Handle separate run log-output resource. This requires Job-Runner >= v1.3.0.
- Fix killing job-runs. Where v1.0.5 was killing children processes, it did
not kill children of children, … This should kill the full tree of
- Freeze requests library version, since 1.0.0 contains backwards compatible
- Fix killing job-runs. When the process had sub-processes, only the parent
process was killed and the worker was waiting for the child-processes to
- Add config variable max_log_bytes to limit the amount of logdata that
will be send back to the API (to avoid 413 Request Entity Too Large
- Send pid back to the REST API when a job has been started.
- Kill a job-run when a kill action is received.
- Make sure that the API exactly matches.
- Make the timezones send to the REST API timezone aware.
- Deployar related changes.
- Fix encoding issue when writing the file.
- Refactor to make the worker compatible with the 0.7 version of the
- Make it consume runs from the queue broadcaster instead of hitting the REST
interface every x seconds.
- Add retry on error to recover from temporary REST interface errors.
- Merge fixes v0.5.1 and v0.5.2 into v0.6.x version.
- Refactor to make use of separate WebSocket Server.
- Make temporary path for scripts configurable.
- Disable SSL certificate validation.
TODO: Brief introduction on what you do with files - including link to relevant help section.