Hufflepuffs are particularly good finders.
Project description
Hufflepuffs are particularly good finders.
Hufflepuff is a program for exploring large mostly-convex search spaces. The algorithm it uses is basically beam search, except that it’s expected that exploring all successors of any node is too computationally expensive, and so it explores a sample of them (with larger sample sizes for higher-scoring nodes).
You need to provide three things to Hufflepuff:
- The initial state(s) to explore from.
- A program for generating a successor of a given state. (the same state may be given multiple times to get multiple successors)
- A program for assigning a score to any state.
Hufflepuff has been used to:
- Group students into roughly equal teams, given approximate skill levels.
- Assign students to classes, such that class sizes are roughly equal and students’ preferences are taken into account.
- Generate source code for an esoteric programming language that produced a desired output.
In those three cases, the initial states were randomly generated (with virtually no domain knowledge), the successor generation just randomly changed elements in the state, leaving the scoring program as the single place for specifying the desired properties of the solution.
Invocation
With a config file:
$ hufflepuff -c config.cfg
config.cfg:
initial-file = initial.json
initial-binary = cat $initial-file
mutate-binary = ./mutate
score-binary = ./score
beam-width = 5
expand-mantissa = 1.25
expand-multiplier = 3
target = max
num-iterations = 20
parallelism = NUM_CPUS
The config file is a shorthand for providing the args on the command line:
$ hufflepuff --initial-binary=./initial
Search parameters
By default, hufflepuff will attempt to find states with the highest scores: this
is specified by --target=max
. If your score metric is a “cost”, then setting
--target=min
will make it attempt to find states with the lowest scores.
The algorithm followed for searching is:
- Get the states from
--initial-binary
. - For
--num-iterations
iterations: - Generate a new set of states from the current set of states.
* The best-scored state will have
--expand-multiplier
states generated from it. * Following states will have an exponentially decreasing number of states generated from them (rounding down, but there will always be at least one new state for each current state). - Score the new states.
- Throw away all but the best
--beam-width
states.
However, in addition, the best --beam-width
states ever seen are kept, for
output at the end of the search.
Communication
All binaries are expected to use newline-separated JSON values, either read from stdin, or written to stdout, encoded in UTF-8.
In order to fully utilise multiple cores, multiple instances of the mutation and scoring binaries are run, and input is provided simultaneously (so these binaries can be written as naïve single-threaded processes but still scale seamlessly).
If this is not desired (for instance, if there is a shared resource, or if you
want to implement an more sophisticated form of parallelism),
set --parallelism=1
.
Future lines of input may be provided before a given line has finished
processing to aid in this, but this should not be relied upon (don’t block a
line of output on reading the next line of input).
Initial binary
The initial-binary
is run to provide the set of starting values for the
search. For convenience, the substring "$initial-file" is replaced with the
value of the initial-file
argument, so the default behaviour is to read from
that file.
A particularly useful thing to do with this is to store the results of a search
in best.txt
, and then use --initial-file=best.json
to continue the
search from that point.
Mutation binary
The mutate-binary
is run to take values and turn them into slightly
different values (e.g., swap two values in a list, or add/subtract some amount
from a number). Usually this should be done entirely at random, rather than
trying to make a “smart” decision: the search will quickly eliminate really
bad options, and you may need intermediate okay-ish states.
Each line of input should correspond to exactly one line of output (if hufflepuff wants multiple mutations, it will enter the value multiple times).
Scoring binary
The score-binary
is run to provide a number (floating point or integral)
value for a value. The default meaning is that larger numbers are better, but
this is controlled by --target={max,min}
.
Each line of input should correspond to exactly one line of output, and this program should be deterministic (a given value produces the same score every time).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hufflepuff-1.4.tar.gz
.
File metadata
- Download URL: hufflepuff-1.4.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d206181c2f1eb75d74e689f4f8b47d51ede799a1a22dc3da7bab97126c8d5f4c |
|
MD5 | 170c0b4f87274ffa280aff03ee63a9d8 |
|
BLAKE2b-256 | a17685a1bfce25ae2f845f3a46a77d1e58422aa54ee70e4a6c02b53a3d829cce |
File details
Details for the file hufflepuff-1.4-py3-none-any.whl
.
File metadata
- Download URL: hufflepuff-1.4-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16b038144ba3dc881329e0a1a8006772c1813fc135c07de15daff39d347d3f70 |
|
MD5 | ac0e5cde56c8393505e49edad0c6c8ee |
|
BLAKE2b-256 | 6d346b321a9d3ca7a6fbd66b6556c6c1525955d888dfa5e342b8e52aac3b276f |