Stack based automatic testcase generation
Testmachine is a new way of testing. It is a python library, and intended for testing python programs, but the core concept is language agnostic and should be easy to port to other languages.
It’s strongly inspired by Quickcheck and other similar approaches to fuzz testing, but with a twist that essentially turns the idea inside out and makes it much more powerful.
The core concept is that testmachine doesn’t generate data, it generates programs. This is based on ideas from the stateful testing system found in Scalacheck, (as well as my subsequent port of it to Python in hypothesis), but with testmachine it applies equally well to stateful and non-stateful systems alike.
Programs generated by statemachine are defined by composing a number of user provided rules. These rules will normally fall into one of three categories:
In reality all three of these are represented by the same type internally and it will sometimes be convenient to use rules which are hybrids of these (e.g. operations which also do make some assertions, or functions which return a random assertion), but for the most part this separation is convenient.
You define these rules as python functions, and testmachine then takes them and attempts to find a combination which will produce an exception. If it finds one it then does its best to minimize it and compiles the final output into a simple text format that will usually be interpretable as valid python (it’s generated by simple string expansion from patterns. If your values have reprs which are valid python and you don’t do anything too unusual then the output will be valid python. Otherwise it may take some editing but should be readable on its own).
This code should be considered pretty experimental right now. You can use it and it will probably work, but I make no promises about the API being stable. It’s very much still a work in progress as I explore a new idea.
How does it work?
At its core, a Testmachine describes a language of programs that can be executed on a machine with multiple named stacks of values. It generates random programs in an attempt to find one which causes an error.
Each program is a sequence of operations. Each of which may manipulate the stacks in an arbitrary way. If an operation throws an exception, testmachine has now found a bug.
Because of its definition in terms of stack operations it’s very easy to then minimize the program: A program is just a list of operations with no explicit dependencies between them (all data dependency goes via the stacks), so you can just delete instructions from it, remove any instructions which find themselves in an invalid state (e.g. if not enough values are on the stacks they need) and see if the program still fails. If it does, you’ve shrunk the program. We then iterate this process greedily until we can no longer shrink the program further.
Although this does not produce program which is guaranteed to be globally minimal, in practice it generally seems to do extremely well at producing short example programs.
Once this has complete, we have an example program against the stack machine. But this isn’t very readable, so we want to compile it to a simpler python output.
We do this by maintaining a set of shadow stacks with variable names, one for each of stacks in our machine. By tracking which variables are read and written in an operation we convert the stack operations into a set of variable reads and writes. The result is a sequence of variable reads and assignments in SSA form which is equivalent to the stack program. We then do some simple string templating which generates output that looks quite like a python program.
This version of testmachine has been tested using Python series 2.6, 2.7, 3.2, 3.3 and pypy. Builds are checked with travis: