Manage statuses for a large amount of data analysis resources, such as files, imports, etc.

These details have not been verified by PyPI

Project links

Project description

hadrosaur — computed resource management

logo

Hadrosaur makes it easy to track the completion status, errors, and logs of large amounts of resources (files, metadata, analytics, database imports, etc.).

You simply define your resource as a decorated Python function that can create files and save metadata using an identifier in a certain namespace. Later on, you can quickly fetch the status and results of previously computed resources.

This library uses a combination of LevelDB and the file system to track the state of your tasks.

Quick usage tutorial

Install

pip install hadrosaur

Define a resource collection

Import the lib and initialize a project using a base directory. Files, metadata, and logs will all get stored under this directory.

from hadrosaur import Project

proj = Project('./base_directory')

Define a collection using a decorator around a function. This function's job is to generate a single resource for the collection given a unique ID and some arguments.

The collection should have a unique name, and its function must take these params:

ident — an identifier (unique across the collection) for each computed resource
args — a dictionary of optional arguments
ctx — a Context object which holds some extra data you may find useful during computation:
- ctx.subdir - the path of a directory in which you can store files for this resource
- ctx.logger - a special Python logging instance that will write to a rotating log file stored in the resource directory, with some nice default formatting

@proj.resource('collection_name')
def compute_resource(ident, args, ctx):
  ctx.logger.info("Starting up")
  # Run some things...
  # Maybe save stuff into ctx.subdir...
  time.sleep(1)
  # Return any JSON-serializable data for the resource, such as metadata, run results, filepaths, etc.
  return {'ts': time.time()}

Fetch a resource

Use the proj.fetch(collection_name, ident) method to compute and cache resources in a collection.

Keyword arguments:

args -- an optional dict of extra arguments for the resource compute function
recompute -- force the resource to be re-computed, even if it has already been computed

What happens when you fetch a resource:

If the resource has not yet been computed, the collection's compute function will be run.
If the resource was already computed in the past, then the saved results will get returned instantly (unless recompute=True has been set in the keyword arguments).
If an error is thrown in the function, logs will be saved and the status will be updated

>> proj.fetch('collection_name', 'uniq_ident123', optional_args)
<Resource>

The resource object has the following properties:

resource.result: any JSON-serializable data returned by the resource's compute function
resource.start_time: The unix epoch (in milliseconds) of when the resource started being computed
eresource.end_time: the unix epoch (in ms) of when the resource finished computing (or failed)
resource.status: whether the resource has been computed already ("completed"), is currently being computed ("pending"), has not yet been fetched at all ("unavailable"), or threw a Python error while running the function ("error")
resource.paths: A dictionary of all the filesystem paths associated with your resource, with the following keys:
- 'base': The base directory that holds all data for the resource
- 'error': A Python stacktrace of any error that occured while running the resource's function
- 'log': A line-by-line log file produced by the resource's logger (ctx.logger)
- 'status': Path to the current status ("unavailable", "completed", "pending", "error")
- 'result': Path to a JSON file of serializable data returned by the resource's function
- 'storage': Directory path of additional files written by the resource's function (ctx.subdir)

Fetch status and information

Fetch stats for a collection

To see status counts for a whole collection, use proj.stats('collection_name'):

> proj.stats('collection_name')
{
  'counts': {
      'total': 100,
      'pending': 75,
      'completed': 20,
      'error': 5,
      'unavailable': 0
  }
}

Use proj.stats() without an argument to fetch the stats for all collections.

To get a list of resource IDs for a given status, use proj.fetch_by_status:

> proj.fetch_by_status('collection_name', 'pending')
['1', '2', '3'..]

Fetch info about a single resource

Use proj.status('collection_name', 'resource_id') to see the status of a particular resource.

> proj.status('collection_name', 'resource_id')
"complete"

If an exception was raised during the execution of the function used to compute a resource, then use proj.fetch_error to see the error.

> proj.fetch_error('collection_name', 'resource_id')
"""Traceback (most recent call last):
  File "/home/j/code/hadrosaur/hadrosaur/main.py", line 211, in fetch
    result = func(ident, args, ctx)
  File "/home/j/code/hadrosaur/test/test_general.py", line 26, in throw_something
    raise RuntimeError('This is an error!')
RuntimeError: This is an error!"""

To see the run log (produced by ctx.logger during function execution), then use proj.fetch_log

> proj.fetch_log('collection_name', 'resource_id')
"""
2020-02-05 16:15:35 INFO     output here (test_general.py:25)
2020-02-05 16:15:35 INFO     more output here (test_general.py:25)
"""

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

Mar 12, 2020

0.4.0

Feb 19, 2020

0.3.2

Feb 7, 2020

0.3.1

Feb 7, 2020

0.3.0

Feb 6, 2020

0.2.0

Feb 6, 2020

0.1.0

Feb 1, 2020

0.0.3

Jan 31, 2020

0.0.2

Jan 31, 2020

0.0.1

Jan 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hadrosaur-0.4.1.tar.gz (7.3 kB view details)

Uploaded Mar 12, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hadrosaur-0.4.1-py3-none-any.whl (6.5 kB view details)

Uploaded Mar 12, 2020 Python 3

File details

Details for the file hadrosaur-0.4.1.tar.gz.

File metadata

Download URL: hadrosaur-0.4.1.tar.gz
Upload date: Mar 12, 2020
Size: 7.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.0.3 CPython/3.7.3 Linux/5.3.0-7629-generic

File hashes

Hashes for hadrosaur-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`81b3d52b5dd9e938bfc0fd87aef86873d7d7e3db12d91501df417da105aaf1c5`
MD5	`0ebcb66b4a0bd011e95540720f884264`
BLAKE2b-256	`d90e1105ecf0215fe6b9f273ef84bc87305dc586f36d0da0a423864cef678ea3`

See more details on using hashes here.

File details

Details for the file hadrosaur-0.4.1-py3-none-any.whl.

File metadata

Download URL: hadrosaur-0.4.1-py3-none-any.whl
Upload date: Mar 12, 2020
Size: 6.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.0.3 CPython/3.7.3 Linux/5.3.0-7629-generic

File hashes

Hashes for hadrosaur-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75c2abb563d1ca574fa59768212b0ede20833586c47089ba04807173cffcafc3`
MD5	`e9cf94a704d387350f37dc31dae9334a`
BLAKE2b-256	`edd56495c7806e892b1d8c9f9a4ca5a786093f1b2ea5ef7af90253f550968905`

See more details on using hashes here.

hadrosaur 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hadrosaur — computed resource management

Quick usage tutorial

Install

Define a resource collection

Fetch a resource

Fetch status and information

Fetch stats for a collection

Fetch info about a single resource

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes