Skip to main content

Azkaban CLI

Project description

A lightweight Azkaban client providing:

  • A command line interface to run jobs, upload projects, and more.

    $ azkaban upload
    Project my_project successfully uploaded (id: 1, size: 205kB, version: 1).
    Details at https://azkaban.server.url/manager?project=my_project
  • A convenient and extensible way to build project configuration files.

    from azkaban import Job, Project
    project = Project('my_project')
      Job({'type': 'command', 'command': 'echo "hello, azkaban"'})


Using pip:

$ pip install azkaban

Command line interface


Once installed, the azkaban executable provides several commands. These are divided into two kinds:

Those which will work out of the box with any standard Azkaban project:

  • azkaban (create | delete) [options]

    Create (or delete) a project on a remote Azkaban server.

  • azkaban run [options] FLOW [JOB ...]

    Launch (asynchronously) an entire workflow or specific jobs in a given workflow. This command will print the corresponding execution’s URL to standard out.

  • azkaban upload [options] ZIP

    Upload an existing project zip archive.

Those which require a configuration file (cf. project configuration files):

  • azkaban build [options]

    Generate a project’s job files and package them in a zip file along with any other project dependencies (e.g. jars, pig scripts). This archive can either be saved to disk or directly uploaded to Azkaban.

  • azkaban info [options]

    View information about all the jobs inside a project, its static dependencies, or a specific job’s options.

Running azkaban --help shows the full list of options available for each command.

URLs and aliases

The previous commands all take a --url, option used to specify where to find the Azkaban server (and which user to connect as).

$ azkaban create -u

In order to avoid having to input the entire URL every time, it is possible to defines aliases in ~/.azkabanrc:

foo =
bar = baruser@
default.alias = foo

We can now interact directly with each of these URLs using the --alias option followed by their corresponding alias. Since we also specified a default alias, it is also possible to omit the option altogether. As a result, the commands below are all equivalent:

$ azkaban create -u
$ azkaban create -a foo
$ azkaban create

Note finally that our session ID is cached on each successful login, so that we won’t have to authenticate on every remote interaction.

Project configuration files

We provide here a framework to define projects, jobs, and workflows from a single python file.


For medium to large sized projects, it quickly becomes tricky to manage the multitude of files required for each workflow. .properties files are helpful but still do not provide the flexibility to generate jobs programmatically (i.e. using for loops, etc.). This approach also requires us to manually bundle and upload our project to the gateway every time.

Additionally, this will enable the build and info commands.


We start by creating a file. Let’s call it (the default file name the command line tool will look for), although any name would work. Below is a simple example of how we could define a project with a single job and static file:

from azkaban import Job, Project

project = Project('foo')
project.add_file('/path/to/bar.txt', 'bar.txt')
project.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

The add_file method adds a file to the project archive (the second optional argument specifies the destination path inside the zip file). The add_job method will trigger the creation of a .job file. The first argument will be the file’s name, the second is a Job instance (cf. Job options).

Once we’ve saved our jobs file, simply running the azkaban executable in the same directory will pick it up automatically and activate all commands. Note that we can also specify a custom configuration file location with the -p --project option.

Job options

The Job class is a light wrapper which allows the creation of .job files using python dictionaries.

It also provides a convenient way to handle options shared across multiple jobs: the constructor can take in multiple options dictionaries and the last definition of an option (i.e. later in the arguments) will take precedence over earlier ones.

We can use this to efficiently share default options among jobs, for example:

defaults = {'': 'boo', 'retries': 0}

jobs = [
  Job({'type': 'noop'}),
  Job(defaults, {'type': 'noop'}),
  Job(defaults, {'type': 'command', 'command': 'ls'}),
  Job(defaults, {'type': 'command', 'command': 'ls -l', 'retries': 1}),

All jobs except the first one will have their property set. Note also that the last job overrides the retries property.

Alternatively, if we really don’t want to pass the defaults dictionary around, we can create a new Job subclass to do it for us:

class BooJob(Job):

  def __init__(self, *options):
    super(BooJob, self).__init__(defaults, *options)


Nested options

Nested dictionaries can be used to group options concisely:

# e.g. this job
  'proxy.user': 'boo',
  'proxy.keytab.location': '/path',
  'param.input': 'foo',
  'param.output': 'bar',
# is equivalent to this one
  'proxy': {'user': 'boo', 'keytab.location': '/path'},
  'param': {'input': 'foo', 'output': 'bar'},

Merging projects

If you have multiple projects, you can merge them together to create a single project. The merge is done in place on the project the method is called on. The first project will retain its original name.

from azkaban import Job, Project

project1 = Project('foo')
project1.add_file('/path/to/bar.txt', 'bar.txt')
project1.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))

project2 = Project('qux')
project2.add_file('/path/to/baz.txt', 'baz.txt')
project2.add_job('baz', Job({'type': 'command', 'command': 'cat baz.txt'}))

# project1 will now contain baz.txt and the baz job from project2

Next steps

Any valid python code can go inside a jobs configuration file. This includes using loops to add jobs, subclassing the base Job class to better suit a project’s needs (e.g. by implementing the on_add and on_build handlers), etc.

Finally, the info command becomes quite powerful when combined with other Unix tools. Here are a few examples:

  • Counting the number of jobs per type: azkaban info -o type | cut -f 2 | sort | uniq -c
  • Viewing the list of jobs of a certain type, along with their dependencies: azkaban info -o type,dependencies | awk -F '\t' '($2 == "job_type")'
  • Viewing the size of each file in the project: azkaban info -f | xargs -n 1 du -h



Since pig jobs are so common, azkaban comes with an extension to:

  • run pig scripts directly from the command line (and view the output logs from your terminal): azkabanpig. Under the hood, this will package your script along with the appropriately generated job file and upload it to Azkaban. Running azkabanpig --help displays the list of available options (using UDFs, substituting parameters, running several scripts in order, etc.).
  • integrate pig jobs easily into your project configuration via the PigJob class. It accepts a file path (to the pig script) as first constructor argument, optionally followed by job options. It then automatically sets the job type and adds the corresponding script file to the project.
from azkaban import PigJob

project.add_job('baz', PigJob('baz.pig', {'dependencies': 'bar'}))

Project details

Release history Release notifications

This version
History Node


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
azkaban-0.3.5.tar.gz (23.3 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page