Skip to main content

Replacement for Make geared towards processing data rather than compiling code

Project description

Produce logo

Produce is an incremental build system for the command line, like Make or redo, but different: it is scriptable in Python and it supports multiple variable parts in file names. This makes it ideal for doing things beyond compiling code, like setting up replicable scientific experiments.

Table of Contents

Requirements

  • A Unix-like operating system such as Linux or Mac OS X. Windows Subsystem for Linux may also work.
  • Python 3.6 or higher
  • Git (for downloading Produce)

Installing Produce

Install the latest release using pip:

pip3 install produce

Or get the development version by running the following command in a convenient location:

git clone https://github.com/texttheater/produce

This will create a directory called produce. To update to the latest version of Produce later, you can just go into that directory and run:

git pull

The produce directory contains an executable Python script also called produce. This is all you need to run Produce. Just make sure it is in your PATH, e.g. by copying it to /usr/local/bin or by linking to it from your $HOME/bin directory.

Usage

When invoked, Produce will first look for a file called produce.ini in the current working directory. Its format is documented in this document. If you want a quick start, have a look at an example project.

You may also have a look at the PyGrunn 2014 slides for a quick introduction.

Motivation

Produce is a build automation tool. Build automation is useful whenever you have one or several input files from which one or several output files are generated automatically – possibly in multiple steps, so that you have intermediate files.

The classic case for this is compiling C programs, where a simple project might look like this:

example dependency chart for compiling a C program

But build automation is also useful in other areas, such as science. For example, in the Groningen Meaning Bank project, a Natural Language Processing pipeline is combined with corrections from human experts to build a collection of texts with linguistic annotations in a bootstraping fashion.

In the following simplified setup, processing starts with a text file (en.txt) which is first part-of-speech-tagged (en.pos), then analyzed syntactically (en.syn) by a parser and finally analyzed semantically (en.sem). Each step is first carried out automatically by an NLP tool (*.auto) but then corrections by human annotators (*.corr) are applied to build the main version of the file which then serves as input to further processing. Every time a new human correction is added, parts of the pipeline must be re-run:

example dependency chart for running an NLP pipeline

Or take running machine learning experiments: we have a collection of labeled data, split into a training portion and testing portions. We have various feature sets and want to know which one produces the best model. So we train a separate model based on each feature set and on the training data, and generate corresponding labeled outputs and evaluation reports based on the development test data:

example dependency chart for running machine learning experiments

A number of articles point out that build automation is an invaluable help in setting up experiments in a self-documenting manner, so that they can still be understood, replicated and modified months or years later, by you, your colleagues or other researchers. Many people use Make for this purpose, and so did I, for a while. I specifically liked:

  • The declarative notation. Every step of the workflow is expressed as a rule, listing the target, its direct dependencies and the command to run (the recipe). Together with a good file naming scheme, this almost eliminates the need for documentation.
  • The Unix philosophy. Make is, at its core, a thin wrapper around shell scripts. For orchestrating the steps, you use Make, and for executing them, you use the full power of shell scripts. Each tool does one thing, and does it well. This reliance on shell scripts is something that sets Make apart from specialized build tools such as Ant or A-A-P.
  • The wide availability. Make is installed by default on almost every Unix system, making it ideal for disseminating and exchanging code because the Makefile format is widely known and can be run everywhere.

So, if Make has so many advantages, why yet another build automation tool? There are two reasons:

  • Make’s syntax. Although the basic syntax is extremely simple, as soon as you want to go a little bit beyond what it offers and use more advanced features, things get quite arcane very quickly.
  • Wildcards are quite limited. If you want to match on the name of a specific target to generate its dependencies dynamically, you can only use one wildcard. If your names are a bit more complex than that, you have to resort to black magic like Make’s built-in string manipulation functions that don’t compare favorably to languages like Python or even Perl, or rely on external tools. In either case, your Makefiles become extremely hard to read, bugs slip in easily and the simplicity afforded by the declarative paradigm is largely lost.

Produce is thus designed as a tool that copies Make’s virtues and improves a great deal on its deficiencies by using a still simple, but much more powerful syntax for mapping targets to dependencies. Only the core functionality of Make is mimicked – advanced functions of Make such as built-in rules specific to compiling C programs are not covered. Produce is general-purpose.

Produce is written in Python 3 and scriptable in Python 3. Whenever I write Python below, I mean Python 3.

Build automation: basic requirements

Let’s review the basic functionality we expect of a build automation tool:

  • Allows you to run multiple steps of a workflow with a single command, in the right order.
  • Notices when inputs have changed and runs exactly those steps again that are needed to bring the outputs up to speed, no more or less.

In addition, some build automation tools satisfy the following requirement (Produce currently doesn’t):

  • Intermediate files can be deleted without affecting up-to-dateness – if the outputs are newer than the inputs, the workflow will not be re-run.

Make syntax vs. Produce syntax and a tour of the basic features

When you run the produce command (usually followed by the targets you want built), Produce will look for a file in the current directory, called produce.ini by default. This is the “Producefile”. Let’s introduce Producefile syntax by comparing it to Makefile syntax.

Rules, expansions, escaping and comments

Here is a Makefile for a tiny C project:

# Compile
%.o : %.c
	cc -c $<

# Link
% : %.o
	cc -o $@ $<

And here is the corresponding produce.ini:

# Compile
[%{name}.o]
dep.c = %{name}.c
recipe = cc -c %{c}

# Link
[%{name}]
dep.o = %{name}.o
recipe = cc -o %{target} %{o}

Easy enough, right? Produce syntax is a dialect of the widely known INI syntax, consisting of sections with headings in square brackets, followed by attribute-value pairs separated by =. In Produce’s case, sections represent rules, the section headings are target patterns matching targets to build, and the attribute-value pairs specify the target’s direct dependencies and the recipe to run it.

Dependencies are typically listed each as one attribute of the form dep.name where name stands for a name you give to the dependency – e.g., its file type. This way, you can refer to it in the recipe using an expansion.

Expansions have the form %{...}. In the target pattern, they are used as wildcards. When the rule is invoked on a specific target, they match any string and assign it to the variable name specified between the curly braces. In attribute values, they are used like variables, expanding to the value associated with the variable name. Besides target matching, values can also be assigned to variable names by attribute-value pairs, as with e.g. dep.c = %{name}.c. Here, c is the variable name; the dep. prefix just tells Produce that this particular value is also a dependency.

If you need a literal percent sign in some attribute value, you need to escape it as %%.

The target variable is automatically available when the rule is invoked, containing the target matched by the target pattern.

Lines starting with # are for comments and ignored.

So far, so good – a readable syntax, I hope, but a bit more verbose than that of Makefiles. What does this added verbosity buy us? We will see in the next subsections.

Named and unnamed dependencies

To see why naming dependencies is a good idea, consider the following Makefile rule:

out/%.pos : out/%.pos.auto out/%.pos.corr
	./src/scripts/apply_corrections $< \
        --corrections out/$*.pos.corr > $@

This could be from the Natural Language Processing project we saw as the second example above: the rule is for making the final pos file from the automatically generated pos.auto file and the pos.corr file with manual corrections, thus it has two direct dependencies, specified on the first line. The recipe refers to the first dependency using the shorthand $<, but there is no such shorthand for other dependencies. So we have to type out the second dependency again in the recipe, taking care to replace the wildcard % with the magic variable $*. This is ugly because it violates the golden principle “Don’t repeat yourself!” If we write something twice in a Makefile, not only is it more work to type, but also if we want to change it later, we have to change it in two places, and there’s a good chance we’ll forget that.

Produce’s named dependencies avoid this problem: once specified, you can refer to every dependency using its name. Here is the Produce rule corresponding to the above Makefile rule:

[out/%{name}.pos]
dep.auto = %{name}.pos.auto
dep.corr = %{name}.pos.corr
recipe = ./src/scripts/apply_corrections %{auto} %{corr} > %{target}

Note that you don’t have to name dependencies. Sometimes you don’t need to refer back to them. Here is an example rule that compiles a LaTeX document:

[%{name}.pdf]
deps = %{name}.tex bibliography.bib
recipe =
	pdflatex %{name}
	bibtex %{name}
	pdflatex %{name}
	pdflatex %{name}

The TeX tools are smart enough to fill in the file name extension if we just give them the basename that we got by matching the target. In such cases, it can be more convenient not to name the dependencies and list them all on one line. This is what the deps attribute is for. It is parsed using Python’s shlex.split function – consult the Python documentation for escaping rules and such. You can also mix dep.* attributes and deps in one rule.

Note that, as in many INI dialects, attribute values (here: the recipe) can span multiple lines as long as each line after the first is indented. See Whitespace and indentation in values below for details.

Note also that dependency lists can also be generated dynamically – see the section on dependency files below.

Multiple wildcards, regular expressions and matching conditions

The ability to use more than one wildcard in target patterns is Produce’s killer feature because not many other build automations tools offer it. The only one I know of so far is plmake. Rake and others do offer full regular expressions which are strictly more powerful but not as easy to read. Don’t worry, Produce supports them too and more, we will come to that. But first consider the following Produce rule, which might stem from the third example project we saw in the introduction, the machine learning one:

[out/%{corpus}.%{portion}.%{fset}.labeled]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

Labeled output files here follow a certain naming convention: four parts, separated by periods. The first one specifies the data collection (e.g. a linguistic corpus), the second one the portion of the data that is automatically labeled in this step (either the development portion or the test portion), the third one specifies the feature set used and the fourth one is the extension labeled. For each of the three first parts, we use a wildcard to match it. We can then freely use these three wildcards to specify the dependencies: the model we use for labelling depends on the corpus and on the feature set but not on the portion to label: the portion used for training the model is always the training portion. The input to labelling is a file containing the data portion to label, together with the extracted features. We assume that this file always contains all features we can extract even if we’re not going to use them in a particular model, so this dependency does not depend on the feature set.

A Makefile rule to achieve something similar would look something like this:

.SECONDEXPANSION:
out/%.labeled : out/$$(subst test,train,$$(subst dev,train,$$*)).model \
                out/$$(basename $$*).feat
        wapiti label -m $< out/$(basename $*).feat > $@

If you are like me, this is orders of magnitude less readable than the Produce version. Getting a Makefile rule like this to function properly will certainly make you feel smart, but hopefully also feel miserable about the brain cycles wasted getting your head around the bizarre syntax, the double dollars and the second expansion.

A wildcard will match anything. If you need more control about which targets are matched, you can use a Python regular expression between slashes as the target pattern. For example, if we want to make sure that our rule only matches targets where the second part of the filename is either dev or test, we could do it like this:

[/out/(?P<corpus>.*)\.(?P<portion>dev|test)\.(?P<fset>.*)\.labeled/]
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

The regular expression in this rule’s header is almost precisely what the above header with three wildcards is translated to by Produce internally, with the difference that the subexpression matching the second part is now dev|test rather than .*. We are using a little-known feature of regular expressions here, namely the (?P<...>) syntax that allows us to assign names to subexpressions by which you can refer to the matched part later.

Note the slashes at the beginning and end are just a signal to Produce to interpret what is in-between as a regular expressions. You do not have to escape slashes within your regular expression.

While regular expressions are powerful, they make your Producefile less readable. A better way to write the above rule is by sticking to ordinary wildcards and using a separate matching condition to check for dev|test:

[out/%{corpus}.%{portion}.%{fset}.labeled]
cond = %{portion in ('dev', 'test')}
dep.model = out/%{corpus}.train.%{fset}.model
dep.input = out/%{corpus}.%{portion}.feat
recipe = wapiti label -m %{model} %{input} > %{target}

A matching condition is specified as the cond attribute. We can use any Python expression. It is evaluated only if the target pattern matches the requested target. If it evaluates to a “truthy” value, the rule matches and the recipe is executed. If it evaluates to a “falsy” value, the rule does not match, and Produce moves on, trying to match the next rule in the Producefile.

Note that the Python expression is given as an expansion. At this point we should explain a few fine points:

  1. Whenever we used expansions so far, the variable names inside were actually Python expressions, albeit of a simple kind: single variable names. But as we see now, we can use arbitrary Python expressions. Expansions used as wildcards in the target pattern are an exception, of course: they can only consist of a single variable name.
  2. The variables we use in rules are actually Python variables.
  3. Attribute values are always strings, so if a Python expression is used to generate (part of) an attribute value, not the value of the expression itself is used but whatever its __str__ method returns. Thus, in the above rule, the value of the cond variable is not True or False, but 'True' or 'False'. In order to interpret the value as a Boolean, Produce calls ast.literal_eval on the string. So if the string contains anything other than a literal Python expression, this is an error.

As an exception to what we said about __str__, if an expansion evaluates to something that is not a string but has an __iter__ method, it will be treated as a sequence and rendered as a white-space separated list, the elements properly shell-quoted and escaped. Note also that parentheses are automatically added around an expansion so it is very convenient to use generator expressions for expansions. All of this is illustrated in the following rule:

[Whole.txt]
deps = %{'Part {}.txt'.format(i) for i in range(4)}
recipe = cat %{deps} > %{target}

Special targets vs. special attributes

Besides not naming all dependencies, there is another reason why Make’s syntax is too simple for its own good. When some rule needs to have a special property, Make usually requires a “special target” that syntactically looks like a target but is actually a declaration and has no obvious visual connection to the rule(s) it applies to. We have already seen an example of the dreaded .SECONDEXPANSION. Another common special target is .PHONY, marking targets that are just jobs to be run, without producing an output file. For example:

.PHONY: clean
clean:
	rm *.o temp

It would be easier and more logical if the “phoniness” was declared as part of the rule rather than some external declaration. This is was Produce does. The Produce equivalent of declaring targets phony is to set the type attribute of their rule to task (the default is file). With this the rule above is written as follows:

[vacuum]
type = task
recipe = rm *.o temp

Note that since it is ungrammatical to “produce a clean”, I invented a naming convention according to which the task that cleans up your project directory is called vacuum because it produces a vacuum. It’s silly, I know.

For other special attributes besides task, see All special attributes at a glance below.

Python expressions and global variables

As we have already seen, Produce’s expansions can contain arbitrary Python expressions. This is not only useful for specifying Boolean matching conditions, but also for string manipulation, in particular for playing with dependencies. This is a pain in Make, because Make implements its own string manipulation language which from today’s perspective (since we have Python) not only reinvents the wheel, but reinvents it poorly, with a rather dangerous syntax. Consider the following (contrived) example from the GNU Make manual where you have a list of dependencies in a global variable and filter them to retain only those ending in .c or .s:

sources := foo.c bar.c baz.s ugh.h
foo: $(sources)
	cc $(filter %.c %.s,$(sources)) -o foo

With Produce, we can just hand the string manipulation to Python, a language we already know and (hopefully) like:

[]
sources = foo.c bar.c baz.s ugh.h

[foo]
deps = %{sources}
recipe = cc %{f for f in sources.split() \
		if f.endswith('.c') or f.endswith('.s')}

This example also introduces the global section, a section headed by [], thus named with the empty string. The attributes here define global variables accessible from all rules. The global section may only appear once and only at the beginning of a Producefile.

Running Produce

Produce is invoked from the command line by the command produce, usually followed by the target(s) to produce. These can be omitted if the Producefile specifies one or more default targets. By default, Produce will look for produce.ini in the current working directory and complain if it does not exist.

A number of options can be used to control Produce’s behavior, as listed in its help message:

usage: produce [-h] [-B | -b] [-d] [-f FILE] [-j JOBS] [-n] [-u PATTERN] [target ...]

positional arguments: target The target(s) to produce - if omitted, default target from Producefile is used

options: -h, --help show this help message and exit -B, --always-build Unconditionally build all specified targets and their dependencies -b, --always-build-specified Unconditionally build all specified targets, but treat their dependencies normally (only build if out of date) -d, --debug Print debugging information. Give this option multiple times for more information. -f FILE, --file FILE Use FILE as a Producefile -j JOBS, --jobs JOBS Specifies the number of jobs (recipes) to run simultaneously -n, --dry-run Print status messages, but do not run recipes -u PATTERN, --pretend-up-to-date PATTERN Do not rebuild targets matching PATTERN or their dependencies (unless the latter are also depended on by other targets) even if out of date, but make sure that future invocations of Produce will still treat them as out of date by increasing the modification times of their changed dependencies as necessary. PATTERN can be a Produce pattern or a regular expression enclosed in forward slashes, as in rules.

Status and debugging messages

When it starts (re)building a target, Produce will tell you so with a status message in green where the target is indented according to how deep in the dependency graph it is. On successful completion of a target, a similar message with complete is printed. If an error occurs while a target is being built, Produce instead prints an incomplete message in red. The latter indicates controlled shutdown: the recipe has been killed and incomplete outputs have been renamed (see below). If you see a (re)building message but no (in)complete message for some target, something went really wrong – this should never happen. In that case, better check for yourself if any incomplete outputs are still hanging around.

Giving the -d/--debug option one, two or three times will cause Produce to additionally flood your terminal with a few, some more or lots of messages that may be helpful for debugging.

Error handling and aborting

When a recipe fails, i.e. its interpreter returns an exit status other than 0, the corresponding target file (if any) may already have been created or touched, potentially leading the next invocation of Produce to believe that it is up to date, even though it probably doesn’t have the correct contents. Such inconsistencies can lead to users tearing their hair out. In order to avoid this, Produce will, when a recipe fails, make sure that the target file does not stay there. It could just delete it, but that might be unwise because the user might want to inspect the output file of the erroneous recipe for debugging. So, Produce renames the target file by appending a ~ to the filename (a common naming convention for short-lived “backups”).

If multiple recipes are running in parallel and one fails, Produce will kill all of them, do the renaming and abort immediately.

The same is true if Produce receives an interrupt signal. So you can safely abort a production process in your terminal by pressing Ctrl+C.

How targets are matched against rules

When producing a target, either because asked to by the user or because the target is required by another one, Produce will always work through the Producefile from top to bottom and use the first rule that matches the target. A rule matches a target if both the target pattern matches and the matching condition (if any) subsequently evaluates to true.

Note that unlike most INI dialects, Produce allows for multiple sections with the same heading. It makes sense to have the same target pattern multiple times when there are matching conditions to make subdistinctions.

If no rule matches a target, Produce aborts with an error message.

Advanced usage

Whitespace and indentation in values

An attribute value can span multiple lines as long as each line after the first is indented with some whitespace. The recommended indentation is either one tab or four spaces. If you make use of this, it is recommended to leave the first line (after the attribute name and the =) blank so all lines of the value are consistently aligned.

The second line of a value (i.e. the first indented one) determines the kind and amount of whitespace expected to start each subsequent line. This whitespace will not be part of the attribute value. Additional whitespace after the initial amount is, however, preserved. This is important e.g. for Python code and the reason why Produce is no longer using Python’s configparser module.

All whitespace at the very beginning and at the very end of an attribute value will be stripped away.

For example, in the following rule, the recipe spans two lines:

[paper.pdf]
dep.tex = paper.tex
dep.bib = paper.bib
recipe =
    pdflatex paper
    pdflatex paper

The prelude

If you use Python expressions in your recipes, you will often need to import Python modules or define functions to use in these expressions. You can do this by putting the imports, function definitions and other Python code into the special prelude attribute in the global section. For example, put this at the beginning of your Producefile to import the errno, glob and os modules and define a helper function for creating directories.

[]
prelude =
    import errno
    import glob
    import os

    def makedirs(path):
        try:
            os.makedirs(path)
        except OSError, error:
            if error.errno != errno.EEXIST:
                raise error

shell: choosing the recipe interpreter

By default, recipes are (after doing expansions) handed to the bash command for execution. If you would rather write your recipe in zsh, perl, python or any other language, that’s no problem. Just specify the interpreter in the shell attribute of the rule.

Running jobs in parallel

Use the -j JOBS command line option to specify the number of jobs Produce runs in parallel. By default, Produce reserves one job slot for each recipe. For recipes that run multiple parallel jobs themselves, it is recommended to specify the number of jobs via the jobs attribute. Produce will then reserve that many job slots for this recipe (but no more than JOBS).

Here is an example where the target b is created by a recipe that runs in parallel:

[a]
deps = b c d
recipe = touch %{target}

[b]
dep.input = input.txt
dep.my_script = ./my_script.sh
jobs = 8
recipe = parallel --gnu -n %{jobs} -k %{my_script} %{input} > %{target}

[c]
dep.my_script = ./my_script.sh
recipe = %{my_script} c > %{target}

[d]
dep.my_script = ./my_script.sh
recipe = %{my_script} d > %{target}

Running produce -j 8 a will run up to 8 jobs in parallel. In this example, the recipes for c and d may run in parallel. The recipe for b will not run in parallel with any other recipe because it uses all 8 job slots.

Dependency files

Sometimes the question which other files a file depends on is more complex and may change frequently over the lifetime of a project, e.g. in the cases of source files that import other header files, modules etc. In such cases, it would be nice to have the dependencies automatically listed by a script. Produce supports this via the depfile attribute in rules: here, you can specify the name of a dependency file, a text file that contains dependencies, one per line. Produce will read them and add them to the list of dependencies for the matched target. Also, Produce will try to produce the dependency file (i.e. make it up to date) prior to reading it. So you can write another rule that tells Produce how to generate each dependency file, and the rest is automatic.

For example, the following rule might be used to generate a dependency file listing the source file and header files required for compiling a C object. This example uses .d as the extension for dependency files. It runs cc -MM to use the C compiler’s dependency discovery feature and then some shell magic to convert the output from a Makefile rule into a simple dependency list:

[%{name}.d]
dep.c = %{name}.c
recipe =
    cc -MM -I. %{name} | sed -e 's/.*: //' | sed -e 's/^ *//' | \
    perl -pe 's/ (\\\n)?/\n/g' > %{target}

The following rule could then be used to create the actual object file. The depfile attribute makes sure that whenever an included header file changes, the object file will be rebuilt:

[%{name}.o]
dep.src = %{name}.c
depfile = %{name}.d
recipe =
    cc -c -o %{target} %{src}

Note that the .c file will end up in the dependency list twice, once from dep.src and once from the dependency file. This does not matter, Produce is smart enough not to do the same thing twice.

Warning: dependency files are made up to date even in dry-run mode!

Rules with multiple outputs

Sometimes you have a command that creates multiple files at once because their creation is inherently linked to the same process – it wouldn’t make sense to try and create them in neatly separated steps. Splitting a file up into multiple chunks is such a case:

split -n 4 data.txt

This command creates four files called xaa, xab, xac and xad. It gets complicated when these output files individually are dependencies of further targets, as in this example:

[split_and_zip]
type = task
deps = xaa.zip xab.zip xac.zip xad.zip

[%{name}.zip]
dep.file = %{name}
recipe = zip %{target} %{file}

[%{chunk}]
dep.txt = data.txt
recipe = split -n 4 %{txt}

If we run the task split_and_zip, it will try to create its (indirect) dependencies xaa, xab, xac and xad independently of each other. Each time, the last rule will match, and each time, the exact same recipe will be executed. This is unncecessary work, one time would be sufficient because it creates all four files in each case. Worse, if we run Produce in parallel, multiple instances of the recipe may run in parallel and corrupt the data.

The solution is to explicitly declare which files a rule produces, other than the target. The outputs attribute serves this purpose. With it, the last rule is rewritten as follows:

[%{chunk}]
outputs = xaa xab xac xad
dep.txt = data.txt
recipe = split -n 4 %{txt}

Additionally, it is good style to add a matching condition to prevent that the rule accidentally matches something that is not its output:

[%{chunk}]
outputs = xaa xab xac xad
cond = %{target in outputs.split()}
dep.txt = data.txt
recipe = split -n 4 %{txt}

Instead of a single outputs attribute, separate attributes with the out. prefix can be used, and both styles can also be mixed, similar to dep./deps. Here is an example of a rule using the out. style to declare that while producing a .pdf file it will also produce an .aux file:

[%{name}.pdf]
dep.tex = %{name}.tex
out.aux = %{name}.aux
recipe =
    pdflatex %{tex}

“Sideways” dependencies

Suppose there is a target A that has some additional output file B. What if a target C wants to declare a dependency on B? For this to work, there must be a rule matching B. B, of course, is produced when A is produced. So, effectively, in order to produce B, A must be produced. We can express this as a dependency: B depends on A. You can write a rule that will tell Produce to produce A when B is requested:

[B]
dep.a = A

(TODO: What if A is up to date but B does not exist?)

Such a rule only serves to “guide” Produce from B to A. It cannot contain its own recipe. This would not make the sense as it is the rule for A that creates B. If you included a recipe, Produce would complain about a cyclic dependency.

Here is a more concrete example: the rule for paper.pdf produces an additional output paper.aux. Another rule, for paper.info, depends on paper.aux. In order for Produce to be able to satisfy this dependency, paper.aux is declared as depending on paper.pdf.

[paper.info]
dep.aux = paper.aux
recipe = cat %{aux} | ./my_tool > %{target}

[paper.aux]
dep.pdf = paper.pdf

[paper.pdf]
dep.tex = paper.tex
outputs = paper.aux
recipe =
    pdflatex paper

There is one final problem here: after running the recipe for paper.pdf, the modification time of paper.pdf may well be greater than that of paper.aux. Since we declared paper.aux dependent on paper.pdf, this means that paper.aux appears as out of date to Produce even though we just produced it. A simple and effective way to prevent this is to include touch %{outputs} as the last line of any rule with multiple outputs. The last rule above thus becomes:

[paper.pdf]
dep.tex = paper.tex
outputs = paper.aux
recipe =
    pdflatex paper
    touch %{outputs}

Producing the outputs for all inputs

Suppose you have a number of input files (say inputs/input001.txt to inputs/input100.txt). Each input can be processed to yield an output file (say models/model001 to models/model100) – for example, by the following rule:

[models/model%{num}]
dep.input = inputs/input%{num}.txt
dep.train = bin/train
recipe = ./%{train} %{input} %{target}

Now you would like to automatically produce the model for every input that is there. You can do this by writing a task, i.e., a rule for a target that is not a file but is just invoked. The task for the example might look like this:

[all_models]
type = task
deps = %{'models/{}'.format(i.replace('input', 'model').replace('.txt, \
         '') for i in os.listdir('inputs')}

This task does not need a recipe because all it does is pull in all the models through its dependencies. The dependencies are specified through an arbitrary Python expression, in this case it looks at the inputs directory and returns the names of the models corresponding to each input. It uses the os module, which needs to be imported. So let’s add a global section with a prelude to do this. The whole Producefile then looks like this:

[]
prelude =
    import os

[models/model%{num}]
dep.input = inputs/input%{num}.txt
dep.train = bin/train
recipe = ./%{train} %{input} %{target}

[all_models]
type = task
deps = %{'models/{}'.format(i.replace('input', 'model').replace('.txt, \
         '') for i in os.listdir('inputs')}

And to produce all models, all you need to do is tell Produce to produce the all_models task:

$ produce all_models

All special attributes at a glance

For your reference, here are all the rule attributes that currently have a special meaning to Produce:

In rules

target
When a rule matches a target, this variable is always set to that target, mainly so you can refer to it in the recipe. It is illegal to set the target attribute yourself. Also see Rules, expansions, escaping and comments.
cond
Allows to specify a _matching condition_ in addition to the target pattern. Typically it is given as a single expansion with a boolean Python expression. It is expanded immediately after a target matches the rule. The resulting string must be a Python literal. If “truthy”, the rule matches and its expansion/execution continues. If “falsy”, the rule does not match the target and Produce proceeds with the next rule, trying to match the target. Also see Multiple wildcards, regular expressions and matching conditions.
dep.*
The asterisk stands for a name chosen by you, which is the actual name of the variable the attribute value will be assigned to. The dep. prefix, not part of the variable name, tells Produce that this is a dependency, i.e. that the target given by the value must be made up to date before the recipe of this rule can be run. Also see Named an unnamed depenencies.
deps
Like dep.*, but allows for specifying multiple unnamed dependencies in one attribute value. The format is roughly a space-separated list. For details, see shlex.split. Also see Named an unnamed depenencies.
depfile
Another way to specify (additional) dependencies: the name of a file from which dependencies are read, one per line. Additionally, Produce will try to make that file up to date prior to reading it. Also see Dependency files.
type
Is either file (default) or task. If file, the target is supposed to be a file that the recipe creates/updates if it runs successfully. If task, the target is an arbitrary name given to some task that the recipe executes. Crucially, task-type targets are always assumed to be out of date, regardless of the possible existence and age of a file with the same name. Also see Special targets vs. special attributes
recipe
The command(s) to run to build the target, typically a single shell command or a short shell script. Unlike Make, each line is not run in isolation, but the whole script is passed to the interpreter as a whole, after doing expansions. This way, you can e.g. define a shell variable on one line and use it on the next. Also see Rules, expansions, escaping and comments.
shell
See shell: choosing the recipe interpreter
out.*
See Rules with multiple outputs
outputs
See Rules with multiple outputs
jobs
See Running jobs in parallel

In the global section

default
A list (parsed by shlex.split) of default targets that are produced if the user does not specify any targets when calling Produce.
prelude
See The prelude

Getting in touch

Produce is being developed by Kilian Evang <%{firstname}@%{lastname}.name>. I would love to hear from you if you find it useful, if you have questions, bug reports or feature requests.

Acknowledgments

The Produce logo was designed by Valerio Basile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

produce-0.8.0.tar.gz (56.5 kB view hashes)

Uploaded Source

Built Distribution

produce-0.8.0-py3-none-any.whl (27.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page