Azkaban CLI
Project description
Lightweight command line interface (CLI) for Azkaban:
Define jobs from a single python file
Build projects and upload to Azkaban from the command line
Installation
Using pip:
$ pip install azkaban
Quickstart
We first create a configuration file for our project. Let’s call it
jobs.py
, although any name would work. Here’s a simple example of how
we could define a project with a single job and a static file:
from azkaban import Job, Project
project = Project('foo')
project.add_file('/path/to/bar.txt', 'bar.txt')
project.add_job('bar', Job({'type': 'command', 'command': 'cat bar.txt'}))
if __name__ == '__main__':
project.main()
The add_file
method adds a file to the project archive (the second
optional argument specifies the destination path inside the zip file). The
add_job
method will trigger the creation of a .job
file. The
first argument will be the file’s name, the second is a Job
instance
(cf. Job options).
From the command line we can now run python jobs.py --help
to view the
list of all available options (build
, upload
, etc.). E.g. the
following command will create the archive foo.zip
containing all the
project’s jobs and dependency files:
$ python jobs.py build foo.zip
Job options
The Job
class is a light wrapper which allows the creation of
.job
files using python dictionaries.
It also provides a convenient way to handle options shared across multiple jobs: the constructor can take in multiple options dictionaries and the last definition of an option (i.e. later in the arguments) will take precedence over earlier ones.
We can use this to efficiently share default options among jobs, for example:
defaults = {'user.to.proxy': 'boo', 'retries': 0}
jobs = [
Job({'type': 'noop'}),
Job(defaults, {'type': 'noop'}),
Job(defaults, {'type': 'command', 'command': 'ls'}),
Job(defaults, {'type': 'command', 'command': 'ls -l', 'retries': 1}),
]
All jobs except the first one will have their user.to.proxy
property
set. Note also that the last job overrides the retries
property.
Alternatively, if we really don’t want to pass the defaults dictionary around,
we can create a new Job
subclass to do it for us:
class BooJob(Job):
def __init__(self, *options):
super(BooJob, self).__init__(defaults, *options)
More
Aliases
To avoid having to enter the server’s URL on every upload (or hard-coding it
into our project’s configuration file, ugh), we can define aliases in
~/.azkabanrc
:
[foo]
url = http://url.to.foo.server:port
[bar]
url = http://url.to.bar.server:port
We can now upload directly to each of these URLs with the shorthand:
$ python jobs.py upload -a foo
This has the added benefit that we won’t have to authenticate on every upload. The session ID is cached and reused for later connections.
Nested options
Nested dictionaries can be used to group options concisely:
# e.g. this job
Job({
'proxy.user': 'boo',
'proxy.keytab.location': '/path',
'param.input': 'foo',
'param.output': 'bar',
})
# is equivalent to this one
Job({
'proxy': {'user': 'boo', 'keytab.location': '/path'},
'param': {'input': 'foo', 'output': 'bar'},
})
Pig jobs
Because pig jobs are so common, a PigJob
class is provided which
accepts a file path (to the pig script) as first constructor argument,
optionally followed by job options. It then automatically sets the job type
and adds the corresponding script file to the project.
from azkaban import PigJob
project.add_job('baz', PigJob('/.../baz.pig', {'dependencies': 'bar'}))
Next steps
Any valid python code can go inside the jobs configuration file. This includes
using loops to add jobs, subclassing the base Job
class to better suit
a project’s needs (e.g. by implementing the on_add
and
on_build
handlers), …
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.