osprey is an easy-to-use tool for hyperparameter optimization for
machine learning algorithms in python using scikit-learn (or using
scikit-learn compatible APIs).

Each osprey experiment combines an dataset, an estimator, a search space
(and engine), cross validation and asynchronous serialization for
distributed parallel optimization of model hyperparameters.

Full documentation

Example (with `mixtape <>`__ models/datasets)


$ cat config.yaml
eval_scope: mixtape
eval: |
('featurizer', DihedralFeaturizer(types=['phi', 'psi'])),
('cluster', MiniBatchKMeans()),
('msm', MarkovStateModel(n_timescales=5, verbose=False)),

min: 10
max: 100
type: int
- ['phi', 'psi']
- ['phi', 'psi', 'chi1']
type: enum

cv: 5

name: mdtraj
trajectories: ~/local/msmbuilder/Tutorial/XTC/*/*.xtc
topology: ~/local/msmbuilder/Tutorial/native.pdb
stride: 1

uri: sqlite:///osprey-trials.db

Then run ``osprey worker``. You can run multiple parallel instances of
``osprey worker`` simultaniously on a cluster too.


$ osprey worker config.yaml
= osprey is a tool for machine learning hyperparameter optimization. =

osprey version: 0.2_10_g18392d9_dirty-py2.7.egg
time: October 27, 2014 10:44 PM
hostname: dn0a230538.sunet
cwd: /private/var/folders/yb/vpt17lxs67vf02qpvgvjrc5m0000gn/T/tmpDgBwlU
pid: 99407

Loading config file: config.yaml...
Loading trials database: sqlite:///osprey-trials.db (table = "trials")...

Loading dataset...
100 elements without labels
Instantiated estimator:
Pipeline(steps=[('featurizer', DihedralFeaturizer(sincos=True, types=['phi', 'psi'])), ('tica', tICA(gamma=0.05, lag_time=1, n_components=4, weighted_transform=False)), ('cluster', MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
init_size=None, max_iter=100, max_no_improvement=...toff=1, lag_time=1, n_timescales=5, prior_counts=0,
reversible_type='mle', verbose=False))])
Hyperparameter search space:
featurizer__types (enum) choices = (['phi', 'psi'], ['phi', 'psi', 'chi1'])
cluster__n_clusters (int) 10 <= x <= 100

Beginning iteration 1 / 1
History contains: 0 trials
Choosing next hyperparameters with random...
{'cluster__n_clusters': 20, 'featurizer__types': ['phi', 'psi']}

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[Parallel(n_jobs=1)]: Done 1 jobs | elapsed: 0.3s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 1.8s finished
Success! Model score = 4.080646
(best score so far = 4.080646)

1/1 models fit successfully.
time: October 27, 2014 10:44 PM
elapsed: 4 seconds.
osprey worker exiting.

You can dump the database to JSON or CSV with ``osprey dump``.



# grab the latest version from github
$ pip install git+git://


# or clone the repo yourself and run ``
$ git clone
$ cd osprey && python install


- ``six``
- ``pyyaml``
- ``numpy``
- ``scikit-learn``
- ``sqlalchemy``
- ``hyperopt`` (recommended, required for ``engine=hyperopt_tpe``)
- ``scipy`` (optional, for testing)
- ``nose`` (optional, for testing)

On python2.6, the ``argparse`` and ``importlib`` backports are also

