Tools for falsifying hypothesis with random data generation

## Project description

Hypothesis is a library for falsifying its namesake. It is inspired by libraries for testing like Quickcheck, with its most direct ancestor being ScalaCheck from which it acquired its approach to test case minimization. It is not itself a test framework, but it provides decorators for easy integration with test frameworks

The primary entry point into the library is the hypothesis.falsify method.

What does it do?

You give it a predicate and a specification for how to generate arguments to test that predicate and it gives you a counterexample.

## Basic examples

```In [1]: from hypothesis import falsify

In [2]: falsify(lambda x,y,z: (x + y) + z == x + (y +z), float,float,float)
Out[2]: (1.0, 1.0, 0.0387906318128606)

In [3]: falsify(lambda x: sum(x) < 100, [int])
Out[3]: ([6, 29, 65],)

In [4]: falsify(lambda x: sum(x) < 100, [int,float])
Out[4]: ([18.0, 82],)

In [5]: falsify(lambda x: "a" not in x, str)
Out[5]: ('a',)

In [6]: falsify(lambda x: "a" not in x, {str})
Out[6]: (set(['a']),)
```

Sometimes we ask it to falsify things that are true:

```In [7]: falsify(lambda x: x + 1 == 1 + x, int)
Unfalsifiable: Unable to falsify hypothesis <function <lambda> at 0x2efb1b8>
```

of course sometimes we ask it to falsify things that are false but hard to find:

```In [8]: falsify(lambda x: x != "I am the very model of a modern major general", str)
Unfalsifiable: Unable to falsify hypothesis <function <lambda> at 0x2efb398>
```

It’s not magic, and when the search space is large it won’t be able to do very much for hard to find examples.

You can also use it to drive tests. I’ve only tested it with py.test, but it has no specific dependencies on it: You just write normal tests which raise exceptions on failures and it will transform those into randomized tests.

So the following test will pass:

```@given(int,int)
assert x + y == y + x
```

And the following will fail:

```@given(str,str)
assert x + y == y + x
```

With an error message something like:

```    x = '0', y = '1'
@given(str,str)
assert x + y == y + x
E       assert '01' == '10'
E         - 01
E         + 10
```

## Stateful testing

You can also use hypothesis for a more stateful style of testing, to generate sequences of operations to break your code.

Considering the following broken implementation of a set:

```class BadSet:
def __init__(self):
self.data = []

self.data.append(arg)

def remove(self, arg):
for i in xrange(0, len(self.data)):
if self.data[i] == arg:
del self.data[i]
break

def contains(self, arg):
return arg in self.data
```

Can we use hypothesis to demonstrate that it’s broken? We can indeed!

We can put together a stateful test as follows:

```class BadSetTester(StatefulTest):
def __init__(self):

@step
@requires(int)
assert self.target.contains(i)

@step
@requires(int)
def remove(self,i):
self.target.remove(i)
assert not self.target.contains(i)
```

The @step decorator says that this method is to be used as a test step. The @requires decorator says what argument types it needs when it is (you can omit @requires if you don’t need any arguments).

We can now ask hypothesis for an example of this being broken:

In [7]: BadSetTester.breaking_example() Out[7]: ((‘add’, 1), (‘add’, 1), (‘remove’, 1)]

What does this mean? It means that if we were to do:

```x = BadSetTester()
x.remove(1)
```

then we would get an assertion failure. Which indeed we would because the assertion that removing results in the element no longer being in the set would now be failing.

## Under the hood

How does hypothesis work?

The core object of hypothesis is the SearchStrategy. It knows how to explore a state space, and has the following operations.

• produce(size,flags). Generate a random element of the state space subject to the flags provided.
• flags(). Return a set of flags that may be used to control the production of elements.
• could_have_produced(element). Say whether it’s plausible that this element was produced by this strategy.
• complexity(element). Return a float saying roughly how “complex” this element is. There’s no meaning attached to this except that hypothesis will try to generate elements of lower complexity.
• simplify(element). Return a generator over a simplified versions of this element.

These satisfy the following invariants:

• produce(size,flags) should produce a distribution with about ‘size’ bits of entropy.
• Any element produced by produce must return true when passed to could_have_produced
• Any element for which could_have_produced returns true must not throw an exception when passed to complexity or simplify
• The expected complexity of produce(size) should be monotonic increasing in size
• for y in simplify(x), complexity(y) <= complexity(x)
• simplify(x) should return a sequence of unique values
• There shold be no chain x_1, x_2, …, x_n with x_{i+1} in simplify(x_i) and x_1 in simplify(x_n).

These are used to explore the state space. produce is called with a number of sizes and flags to generate examples that falsify the hypothesis. The lowest complexity of these examples is then taken, then repeatedly simplified until an example is found with no simplification of it falsifying the hypothesis. This is taken as the end result.

SearchStrategy objects are produced from a descriptor value (which can be anything) and a SearchStrategies object, which has user definable rules for producing strategies.

So for example you can do

```In [35]: SearchStrategies().strategy((int,int,[str]))
Out[35]: TupleStrategy((int, int, [str]))
```

There are some reasonably complicated and subtle things you can do in terms of overriding the defined search strategies. I’m not going to go into them here because it’s all a bit weird and likely to still be in flux.

Warning: This library is still very much in flux, and no release of it right now should be considered to be stable. It’s emerged out of the initial hack stage, and is probably not too broken, but proceed with caution.

## Testing

This version of hypothesis has been tested using Python series 2.7, 3.2, 3.3 and pypy. Builds are checked with travis: