Track information during code testing and researching.
Project description
votakvot
A simple tool helping to track information, metrics, and files during code development, testing, probing, experimentation and analysis.
The idea
You write Python code, annotate a function, call it.
Votakvot track what function parameters are, its result, git repo status, etc...
Change your code, change parameters, try to rerun the function, experiment.
Then votakvot may load back all information as pandas DataFrame
.
Play with data and find the best combination of function parameters and version of a source code.
Basic usage
Write a function and wrap it with an annotation votakvot.track
:
@votakvot.track()
def my_experiment(one, two):
print(one, two)
return one + two
Then call votakvot.init()
to initialize library internals:
votakvot.init(
path="./my-results", # path, where to store results, "." by default
)
Now any invocation of my_experiment(...)
creates a new unique folder
inside ./my-results. That new subfolder contains a file
votakvot.yaml with:
- globally unique id (uuid4)
- timestamps (created, started, finished, duration)
- function parameters
- function result
- git info (branch, commit, work directory tree-ish)
- system information (machine, user, python version)
- traceback text on exception
- any additional ad-hoc information
Additional information added with votakvot.inform
:
@votakvot.track()
def my_experiment(one, two):
...
votakvot.inform(
any_custom_field="any-value",
other_custom_field=["structured", "data"],
)
...
Please note that any parameter, returned, or informed value is serialized by pyyaml. It supports all standard python types: int, float, complex, bool, str, bytes, list, dict, tuple, set, datetime, None. Also any pickleable python class may be serialized (including namedtuples and dataclasses), however it is not recommended.
Load reports
Content of multiple votakvot.yaml files can be loaded into
pandas.DataFrame
by using function votakvot.load_report()
. It gets
file path as a first argument. A path may be prefixed with a protocol
(ftp://
, ssh://
etc).
Some prefixes (like gs://
or s3://
) may require extra libraries to be installed (see
fsspec protocols
for details).
Also path may contain glob patterns: *
corresponds to any string
without /
, **
corresponds to anything.
By default @votakvot.track()
adds date-time of invocation into a subdirectory name:
{function module} / {function name} / {yy}-{mm}-{dd} / {hour}:{minute}:{second} / {unique uuid}
This allows loading results only for a particular module, function, date or date-time only:
root = "/path/to/directory/with/results"
# load all experiments from `root`
votakvot.load_report(root)
# load all experiments with additinoal fields
votakvot.load_report(root, full=True)
# load all experiemnts for specified function only
votakvot.load_report(f"{root}/my_module/function_name")
# load experiments for a single day 2021-05-20 (any function)
votakvot.load_report(f"{root}/**/21-05-20")
# load exprriments for a particular hour (any function)
votakvot.load_report(f"{root}/**/21-05-20/15:*")
A few dataframes may be merged with `pandas.concat`:
# load results for 3 days
df = pandas.concat(
votakvot.load_report(f"{root}/**/{day}/**")
for day in ["21-05-20", "21-05-21", "21-05-22"]
)
Result DataFrame
can be filtered, sorted, updated, transformed,
plotted, serialized, and analyzed with all power of Pandas. See
pandasttutorial.
Additionally raw information may be obtained with load_trials
function:
# load dict of {id -> votakvot.Trial}
vs = load_trials(root)
print("count", len(vs))
print("ids:", vs.keys())
# print raw content of `votakvot.yaml` files
print("data", [v.data for v in vs.values()])
# print only git related information
print("git commits", [v.meta.git.commit for v in vs.values()])
Metrics
Tracked function may produce metrics:
votakvot.meter(
metric_name="metric value",
)
Metrics are stored as series of csv files and can be loaded to single pandas.DataFrame
:
rep = votakvot.load_report()
tid = rep.loc[0]['tid'] # trial id
votakvot.metrics.load_metrics(tid) # instance of pd.DataFrame
Attached files
A regular file may be created next to votakvot.yaml. Use this to store debug information (traceback, logs), create artifacts or even store intermediate results of computation (see resumable tasks).
@votakvot.track()
def my_experiment(one, two):
...
with votakvot.attach("my-file-name.txt", mode='tw') as f:
f.write("some text ...")
Metadata
Library automatically adds metadata to all generated votakvot.yaml files.
Metadata includes information about python environment, git repo
(commit, branch, and index hash), OS version. You can add extra
metadata by putting values into dictionary votakvot.meta.providers
:
# make copy of all providers
my_proivders = dict(votakvot.meta.providers)
# add information about k8s
my_proivders['k8s.version'] = lambda: subprocess.getoutput("kubectl version")
my_proivders['k8s.cluster_info'] = lambda: subprocess.getoutput("kubectl cluster-info")
# include list of all python libraries
my_proivders['python.pip_freeze'] = lambda: subprocess.getoutput("pip freeze")
# delete deafult medatata for 'git'
my_proivders = {
k: v
for k, v in votakvot.metadata_providers.items()
if not k.startswith("git.")
}
# here 'kubectl' command is invoked, but 'git' does not
votakvot.init(
meta_providers=my_proivders, # use custom set of meta providers
)
Resumable tasks
Some trials may take a lot of time to complete, it is possible to make them resumable. A tracked function may be refactored into an iterable pickleable object. If the program fails (or terminated manually) a pickled object is still left on the disk and votakvot will automatically loaded it during the next trial run.
class my_function(votakvot.resumable_fn):
snapshot_period = 5 # snapshot each 5sec, only in-between `self.loop()` calls
def init(self, one, two):
self.one = one
...
def loop(self):
if ...:
return "result" # non-None value to finish compution
else:
return None # repeat `self.loop()` one more time
# autoresume when there is a snapshot for this id on the filesystem
votakvot.run(
f"resumable_pi/n={n}/seed={s}", # id must be explicitly specified for resumable tasks
my_function,
one=1,
two=2,
)
votakvot-ab
Votakvot comes with basic benchmarking tool votakvot-ab
.
It behaves similar to the well known ab utility,
but instead of making HTTP calls invokes a python callback.
Tool may patch sockets with gevent, allowing to run IO-bounded code with bigger concurrency.
Given file my_module.py
:
import requests
def get_example():
return requests.get("http://example.com/").status_code
Then call the function 1000 times in 200 "threads" (using greenlets):
votakvot-ab --gevent -n1000 -c200 my_module.get_example
Also callback might be a class with custom initialization logic (usefull to create HTTP sessions and connection pools, perform precomputations, etc):
import requests
class HTTPGet:
def __init__(self, url):
self.url = url
self.session = requests.Session() # reuse connections
def __call__(self):
return self.session.get(self.url).status_code
Instance of HTTPGet
created once and then method __call__
invoked 1000 times:
votakvot-ab -g -n1000 -c200 my_module.HTTPGet url=http://example.com
See votakvot-ab --help
for all parameters.
License
Votakvot is released under the Apache 2.0 license (see LICENSE)
It is based on a project developed by Allegro https://github.com/allegro/votakvot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for votakvot-0.1rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8475dabbc38940711502725f2c2e1842dd67d6b605eaee61cdbea875f08bc7c0 |
|
MD5 | 797fccb4cc10390e8c89ce73d55da67d |
|
BLAKE2b-256 | 684829cf0dcfea14432066c9685adacc0eda33f9bdf30c681e6be73a823f1df7 |