Skip to main content

job processing

Project description


Alchemy is a tool that eases running jobs like scientific experiments that
differ minimally in configuration. Features include:

- Multiprocessor support,
- nice handling of standard output and standard error of your job,
- nice handling of directory structure so that you don't have to take
care of that in your job functions.

Alchemy requires two things to get started. First, you need to define a function
that actually does what you want. Then you will need a configuration file (we
use yaml for this) where you put the parametrizations of your experiments. You
can then fire off your experiments with

$ python -c config.yaml

Experiments will typically only differ slightly. For example, you might want to
run a simulation on the moon with a gravitation value of 1.63 and on the earth
with a gravitation value of 9.81.

Say you have a function "rocket" in your module "vehicles" of package
"machines". You want to run two experiments, one letting it start from earth and
one where it goes off from the moon. Your rocket has the name "rockstar" and you
want it to be exactly 3 meters long. You would then define the following yaml

function: machines.vehicles.rocket
name: rockstar
length: 3.0
?: [{gravity: 9.81}.
{gravity: 1.63}]

This resembles exactly two experiments. The question mark symbol tells alchemy
that you want to run one experiment for each of the values of the list. Thus, it
will result in the following dictionaries.

The first one will be for earth:

{'function': 'machines.vehicles.rocket',
'name': 'rockstar',
'length': 3.0,
'gravity': 9.81}

while the second one will be for the moon:

{'function': 'machines.vehicles.rocket',
'name': 'rockstar',
'length': 3.0,
'gravity': 1.63}

Alchemy will generate the crossproduct of all varying values. So if you have
multiple ?'s in your yaml file, all possible values for all ?'s will be

But how exactly is your function called? Alchemy retrieves the value of the
field 'function' and looks that object up on the PYTHONPATH. It the removes
the 'function' field, and feeds the resulting dictionary as keyword arguments
into your function. The code would be roughly something like this:

from machines.vehicles import rocket
rocket(name='rockstar', length=3.0, gravity=1.63)

If you are interested in creating more sophisticated configuration files,
you can use all stuff that PyYaml can process. Check out

Outputs of experiments

Alchemy uses Python's very own `uuid` module to generate a unique identifier
for your experiment. It will then make a directory of that identifier and
for each job of your experiment (if you have no varying values, that will be 1)
a separate subdirectory with an increasing number is generated.

The id will be printed out when you start alchemy.

Before your function is executed, the Python interpreter will switch into that
directory. Thus, if you use relative paths in your function, you can make sure
that all your output files fall into that directory.

Furthermore, a file `stdout` and a file `stderr` will be created into which the
corresponding streams will be redirected. Also, the actual configuration used
will be saved into `config.yaml` for future reference.

Using multiple processors

Python's `multiprocessing` package is used to make use of multi processor
systems. You can specify the number of processes used with the `-p` option. It
defaults to one processor.


- No supports for cluster
- Using combinations of varying values other than the cross product

Project details

Release history Release notifications

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page