This package contains the Job-Runner Worker, which is responsible for executing the scheduled jobs managed by the Job-Runner.
Requirements (depending on your distro, the naming might be a bit different):
Then you should be able to install this package with pip install job-runner-worker.
If you want to install this package in development mode, clone this repository and then execute python setup.py develop. In the latter, you might want to install the testing requirements by executing pip install -r test-requirements.txt.
See the getting started section in the Job-Runner documentation ( in the job-runner repo) for setting up the whole project.
Example with required settings:
[job_runner_worker] api_base_url=http://domain.of.job.runner/ api_key=worker1 secret=verysecret script_temp_path=/tmp ws_server_hostname=domain.of.websocket.server broadcaster_server_hostname=domain.of.broadcast.server
All available settings
- The base URL which will be used to access the API. This should start with http:// or https://.
- Public-key to access the API.
- Private-key to access the API.
- The number of jobs to run concurrently. Default: 4.
The log level. Default: 'info'. Valid options are:
- The maximum number of bytes of the log that is sent back to the API. This is to avoid 413 Request Entity Too Large errors. If the log will be larger than this value, 20% of the allowed size will be taken from the top of the log, the remaining 80% will be taken from the bottom. Everything in between will be truncated. Default: 819200 (800kb).
- The hostname of the WebSocket Server.
- The port of the WebSocket Server. Default: 5555.
- The path where the scripts that are being executed through the Job-Runner are temporarily stored. Default: '/tmp'.
- The hostname of the queue broadcaster server.
- The port of the queue broadcaster server. Default: 5556.
- Seconds after which the subscriber is re-connecting to the publisher when no data has been received. Default: 300. This is useful when you are loadbalancing the publisher and it keeps the TCP connection open on the front-end, when the connection on the back-end has been closed. Because of this ZMQ doesn’t detect that it is not connected anymore and jobs get stuck.
For starting the worker, you can use the job_runner_worker command:
usage: job_runner_worker [-h] [--config-path CONFIG_PATH] Job Runner worker (v2.1.0) optional arguments: -h, --help show this help message and exit --config-path CONFIG_PATH absolute path to config file (default: CONFIG_PATH env variable)
- Rollback retry on 4xx errors. Instead, recover when an unexpected error occurs in the execute_run, enqueue_actions, or kill_run. This will recover from when a run was claimed by two workers (e.g. in the case when it was sent to worker a, which doesn’t respond directly, then it was sent to worker b which claims it after which a claims it too).
- Make sure a shebang does exist on scripts to be run. Use shlex to make Popen safer.
- Retry request 5x when the response is in the 4xx range before raising an exception.
- On ping response, send back the version of the worker and the number of concurrent jobs. This version requires that you have job-runner>=3.4.0 running.
- Update error message when job does not start to be more verbose and specific.
- Fix the case where in case of an exception, the run was marked as completed but not started.
- Make sure to only cleanup runs that are assigned to the worker. This version is dependent on job-runner>=3.0.1.
- Make the worker compatible with the new worker-pool structure. IMPORTANT: This version is dependent on job-runner>=2.0.0!
- Change SETTINGS_PATH environment variable to CONFIG_PATH for better naming consistency.
- Make sure that when a run already has log, it is updated (before it would hang on the database integrity error).
- Make the worker crash early instead of hanging on errors happening before the actual job starts, to give the user a visible cue that something went wrong.
- The worker will now terminate gracefully when receiving the TERM signal. This means that all pending jobs will be completed, but that it will not accept any new jobs. After finishing the last pending job, the worker will terminate.
- Set reconnect_after_inactivity default to 10 minutes. This is 2 x the JOB_RUNNER_WORKER_PING_INTERVAL default setting in Job-Runner.
- Implement handler for ping action.
- Add and implement reconnect_after_inactivity setting.
- Run script by finding their shebang without the x bit being needed.
- Handle separate run log-output resource. This requires Job-Runner >= v1.3.0.
- Fix killing job-runs. Where v1.0.5 was killing children processes, it did not kill children of children, … This should kill the full tree of child-processes.
- Freeze requests library version, since 1.0.0 contains backwards compatible changes.
- Fix killing job-runs. When the process had sub-processes, only the parent process was killed and the worker was waiting for the child-processes to complete.
- Add config variable max_log_bytes to limit the amount of logdata that will be send back to the API (to avoid 413 Request Entity Too Large errors).
- Send pid back to the REST API when a job has been started.
- Kill a job-run when a kill action is received.
- Make sure that the API exactly matches.
- Make the timezones send to the REST API timezone aware.
- Deployar related changes.
- Fix encoding issue when writing the file.
- Refactor to make the worker compatible with the 0.7 version of the job-runner package.
- Make it consume runs from the queue broadcaster instead of hitting the REST interface every x seconds.
- Add retry on error to recover from temporary REST interface errors.
- Merge fixes v0.5.1 and v0.5.2 into v0.6.x version.
- Refactor to make use of separate WebSocket Server.
- Make temporary path for scripts configurable.
- Disable SSL certificate validation.
- Initial release.