A small set of complimentary tools for exploratory computational research
Project description
CommandGraph
CommandGraph is a small set of complimentary tools for exploratory computational research. It provides functionality to simplify the following tasks:
- Routing, validating, and storing command configurations
- Keeping track of command states and executing command dependencies when necessary
- Storing and accessing command outputs
- Generating command-line and web-based user interfaces
The full documentation is available here <https://masonmcgill.github.io/cmdgraph/>_.
Design
CommandGraph attempts to provide a minimal, coherent interface based on standard, cross-language technologies, including
YAML <http://yaml.org/>/JSON <https://www.json.org/>for configuration authoringJSON-Schema <http://json-schema.org/>_ for configuration validationHDF5-SWMR <http://docs.h5py.org/en/latest/swmr.html>_ for concurrency-safe array serialization, andREST/HTTP <https://en.wikipedia.org/wiki/Representational_state_transfer>_ for exploring command outputs.
It should take a few minutes to learn and a few days to rewrite in your favorite programming language.
Installation
.. code-block:: shell
pip install cmdgraph
Conda <https://conda.io/docs/>_ works as well. CommandGraph requires Python ≥3.6.
Commands
To CommandGraph, a Command is an object that writes files to a directory, based on a configuration and/or the outputs of other commands.
Defining commands
A minimal command with input and output looks like this:
.. code-block:: python
import cmdgraph as cg
class SayHello(cg.Command): output_path = 'greetings/{name}' # Config fields are substituted automatically # (though using this shorthand is optional). def run(self): self.output['message.h5'] = ( # Records provide a concurrency-safe, f'Hello, {self.conf.name}!') # array-friendly view into the filesystem.
SayHello(name='Sven')() # This writes "Hello, Sven!" to /greetins/Sven/message.h5, # and writes metadata to /greetings/Sven/_cmd-spec.yaml # and /greetings/Sven/_cmd-status.yaml.
Accessing command metadata
cmd.spec returns a command's specification---its configuration, augmented with a field encoding its type---as a JSON-like object (an arbitrarily nested combination of bool, int, float, str, NoneType, list, and SimpleNamespace instances).
cmd.status returns the command's execution status: “running”, “done”, “stopped”, or “unbegun”.
.. todo::
Reimplement cmd.spec.
Executing commands
Calling a command (cmd()) invokes it unconditionally.
require invokes the command if necessary and blocks until it has finished executing. It does nothing if the command's status is "done".
.. code-block:: python
class WarpCatPictures(cg.Command): output_path = 'warped-cats'
def run(self):
cats = cg.require(GetCats(source='the-internet')) # `require` returns the dependent
self.output['result.png'] = warp_thoroughly(cats) # command's output record.
Records
A Record is an concurrency-safe, array-friendly view of a directory. Records support four types of data transactions: reading, writing, appending, and deleting.
Records pointing to directories created by Command\ s also provide access to command metadata.
Obtaining a record
.. code-block:: python
record = cg.Record('some/directory/path/')
Since a record is just a view into a directory, constructing it does not perform any filesystem operations. Files and directories are created lazily, even if the records' path does not exist.
Reading entries
Subscripting a record with a key corresponding to a file returns an array:
.. code-block:: python
array = record['file/path.h5']
HDF5 (".h5"), JPEG (".jpg"/".jpeg"), PNG (".png"), and bitmap (".bmp") formats are currently supported. Files with other extensions are treated as plain text files. Open a GitHub issue or pull request to request new format support.
Subscripting a record with a key corresponding to a directory returns a subrecord:
.. code-block:: python
subrecord = record['directory/path/']
Records also have dict-style iteration methods (keys, values, and items). These methods iterate over all entries in the directory corresponding to the record, with the exception of those with names beginning with "_".
Writing entries
Subscript-assigning can be used to write an array to a file.
.. code-block:: python
record['file/path.h5'] = array
Subscript-assigning can also be used to copy the contents of one record into another, deleting its previous contents.
.. code-block:: python
record['directory/path/'] = another_record
A [nested] dict of array-like objects can also be used to tersely write to multiple files.
.. code-block:: python
record['beings/animals/'] = { 'dogs': {'snoopy.h5': snoopy_data}, 'cats': {'garfield.png': garfield_data}}
Appending to entries
Appending works analogously to writing, and creates files and directories as necessary.
.. code-block:: python
record.append('file/path.h5', array) record.append('directory/path/', another_record) record.append('directory/path/', dict_of_arrays)
Deleting entries
Deleting an entry removes files/directories recursively, from the key downward, and deletes empty parent directories, up to record.path. (In other words, deleting performs the inverse of the "create as necessary" operations writing performs.)
.. code-block:: python
del record['some/path']
Accessing command metadata
Records also supports reading command metadata (stored in _cmd-spec.yaml and _cmd-status.yaml) via the cmd_spec and cmd_status properties.
Running a data server
Records can also be accessed via HTTP. Currently, only GET operations are supported. Call serve to start a data server allowing clients to access the contents of a directory via a REST API.
.. code-block:: python
The following routes are supported:
- //_entry-names
- //_cmd-info
- //
- //?mode=file
cg.serve('my-data/', port=5555)
When running the data server on a publicly accessible machine, SSH tunneling <https://blog.trackets.com/2014/05/17/ssh-tunnel-local-and-remote-port-forwarding-explained-with-examples.html>_ combined with a firewall <https://help.ubuntu.com/community/UFW>_ can be used to prevent public data access.
Configuration management
CommandGraph Command\ s are Configurable objects, which means they can be constructed from JSON-like objects and support configuration schema specification (to document and validate configuration fields).
Non-command configurable objects can be defined as well, which can be useful when components are shared between multiple commands:
.. code-block:: python
class Muppet(cg.Configurable): ...
kermit = Muppet(color='green', has_it_easy=False)
Configurable object properties
An object's configuration can be accessed via obj.conf. obj.spec provides its specification: its configuration augmented with a field indicating its type.
.. code-block:: python
kermit.conf # => Namespace(color='green', has_it_easy=False) kermit.spec # => Namespace(color='green', has_it_easy=False, type='main/Muppet')
Creating objects from specifications
Objects can be instantiated from specifications using the create function. This can be helpful when instantiating configurable objects within commands.
.. code-block:: python
class PutOnAShow(cg.Command): def run(self): muppet = cg.create(self.conf.muppet) print(muppet.tell_a_joke())
PutOnAShow(muppet=load_muppet_spec())()
Defining namespaces
By default, the type field in an object's specification is derived from it's type's name and module path, which may be volatile over the course of a project's development. This limits the usefulness of stored specifications.
Entering a Namespace can override this default behavior with more stable (and often more readable) bindings:
.. code-block:: python
with cg.Namespace({'Muppet': a.b.c.Something}): a.b.c.Something().spec # => Namespace(type='Muppet')
.. todo::
Fix the "conflicting meanings of namespace" issue. Maybe types.SimpleNamespace should be dropped in favor of dict\ s? Maybe cg.Namespace should be called cg.Scope?
Defining schemas
Override a configurable type's Conf class to specify a configuration schema.
Members of Conf are interpreted in the following way:
- The member's name corresponds to the expected property's name.
- A
typevalue specify the property's expected type. - A single-element
listvalue specifies the property's default value. - A
strvalue specifies the property's docstring. - A
tuplevalue may specify any combination of the above.
Example:
.. code-block:: python
class Person(cg.Configurable): class Conf: name = str, 'a long-winded pointer' age = int, [0], 'solar rotation count' shoe_size = 'European standard as of 2018-08-17'
Defining configuration schemas is completely optional, but it enables configuration validation and provides nice documentation, both in the code, and in CommandGraph-generated web and command-line interfaces.
.. todo::
Make config schemas available as JSON-like objects.
.. todo::
Expose schemas in the web interface.
Generating a command-line interface
cli generates a command-line interface exposing every function in the current namespace stack.
.. code-block:: python
Generates the branching interface
<this-file> {a|b} [<conf>].
with cg.Namespace({'a': DoA, 'b': DoB}): cg.cli()
Related packages
Luigi <https://luigi.readthedocs.io/en/stable/>_ focuses on managing large, complex graphs of commands, possibly distributed across multiple machines. From the developers: "Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in."Sacred <https://pypi.org/project/sacred/>_ focuses on configuration management and random number generator seed control. It's more oriented towards writing scripts than writing APIs. From the developers: "Sacred is a tool to help you configure, organize, log and reproduce experiments."GNU Make <https://www.gnu.org/software/make/>_. Sometimes it's best to just keep things simple : )
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cmdgraph-0.1.1.tar.gz.
File metadata
- Download URL: cmdgraph-0.1.1.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c599f896de2c42077a293b2789e0fd5b5874c06244f0e2c910de4317977525c
|
|
| MD5 |
36a6ab63850cc1f22a80a6e889ac6abb
|
|
| BLAKE2b-256 |
8dd4205ca8bec90670c2aa7367bfd5af3072269f458d7faef33d0972d0f57cdf
|