Reproducible library
Project description
The Reproducible Python Library
Keep track of your results.
Ever produced a result for a paper, only to realize a few months later that you could not reproduce it? That you had no idea which version of the code, and which parameter values were used back then?
The reproducible library, developped by the Cognitive Neuro-Robotics Unit at the Okinawa Institute of Science and Technology (OIST), aims to provide an easy way to gather and save important information about the context in which a result was computed. This includes details about the OS, the Python version, the time, the git commit, the command-line arguments, hashes of input and output files, and any user provided data.
Other Python libraries doing just that exists such as Recipy and Sumatra. And they are good. Do try them. They each have their own design philosophy, which proved to be difficult to interface with some of the workflows of the Cognitive Neuro-Robotics Unit lab at OIST.
With Reproducible the goal was to have a small non-intrusive library allowing precise control over the data collected and how to output it. In particular, the goal was to have the tracking info sitting next to—or better, directly embedded in—the result files. That makes sending results to collaborators or packaging them for publication straightforward.
The reproducible library is licensed under the LGPL version 3, to allow you to use it along-side code that use other licenses.
The library is in beta; expect some changes. Python 3.5 or later is officially supported, but the code runs on 2.7 and 3.4 (but not on 3.3) as well.
Install
pip install reproducible
Instant Tutorial
Say this is your code, which is fully committed using git:
import random
import pickle
def walk(n):
"""A simple random walk generator"""
steps = [0]
for i in range(n):
steps.append(steps[-1] + random.choice([-1, 1]))
return steps
if __name__ == '__main__':
random.seed(1)
results = walk(10)
with open('results.pickle', 'wb') as f:
pickle.dump(results, f)
To add reproducible tracking:
import random
import pickle
import reproducible
def walk(n):
"""A simple random walk generator"""
steps = [0]
for i in range(n):
steps.append(steps[-1] + random.choice([-1, 1]))
return steps
if __name__ == '__main__':
# recording repository state
# here we are okay with running our code with uncommitted changes, but
# we record a diff of them in the tracked data.
reproducible.add_repo(path='.', allow_dirty=True, diff=True)
# recording parameters; this is not necessarily needed, as the code state
# is recorded, but it is convenient.
seed = 1
random.seed(seed)
reproducible.add_data('seed', seed)
n = 10
results = walk(n)
reproducible.add_data('n', n)
# recording the hash of the output file
with open('results.pickle', 'wb') as f:
pickle.dump(results, f)
reproducible.add_file('results.pickle', category='output')
# you can examine the tracked data and add or remove from it at any moment
# by running `reproducible.data()`: it is a simple dictionary
# saving the provenance data
reproducible.save_yaml('results_prov.yaml')
See also the The API Reference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for reproducible-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b54881fa12a99cc2630f868107325cd2ef34ca0aaf45292b7e14449cd42c9f8 |
|
MD5 | be20f5fe7554c7cbd848d6c16b1a8835 |
|
BLAKE2b-256 | f300343ede3fe4b0673512f066489eb4b72faa12efb411db127b5b6dee5f8444 |