Skip to main content

Cache output of idempotent jobs.

Project description

Jobstamps
=========

Make-like caching of idempotent functions for python.

This module provides memoization of long-running functions which have
clearly documented side effects and do not change their result if their
inputs have not changed. It is ideal for tools which analyze text files
to produce some output, such as a source code linter. The result of a
the function is stored in a file which is named by the hash of the
function's arguments.

A separate ``jobstamp`` command line utility is provided for integration
with shell scripts or non-python commands. This utility caches the
standard input, output and error of command line invocation and upon
running that utility with the same arguments, the cached output is
printed and return code returned.

Status
------

+----------------------+----------------------+---------------+------------------+-------------+
| Travis CI (Ubuntu) | AppVeyor (Windows) | Coverage | PyPI | Licence |
+======================+======================+===============+==================+=============+
| |Travis| | |AppVeyor| | |Coveralls| | |PyPI|\ |PyPI| | |License| |
+----------------------+----------------------+---------------+------------------+-------------+

Usage
-----

::

usage: jobstamp [-h] [--dependencies [PATH [PATH ...]]]
[--output-files [PATH [PATH ...]]]
[--stamp-directory DIRECTORY] [--use-hashes]

Cache results from jobs

optional arguments:
-h, --help show this help message and exit
--dependencies [PATH [PATH ...]]
A list of paths which, if more recent than the last
time this job was invoked, will cause the job to be
re-invoked.
--output-files [PATH [PATH ...]]
A list of expected output paths form this command,
which, if they do not exist, will cause the job to
be re-invoked.
--stamp-directory DIRECTORY
A directory to store cached results from this
command.
If a matching invocation is used and the files
specified in --dependencies and --output-files are
up-to-date, then the cached stdout, stderr and
return code is used and the command is not run
again.
--use-hashes Use hash comparison in order to determine if
dependencies have changed since the last invocation
of the job. This method is slower, but can
withstand files being copied or moved.

API Usage
---------

Python modules can integrate directly with the jobstamp API, which is
exposed as so:

::

jobstamp.run(func, *args, **kwargs)

The default signature allows for the specified function to be applied to
the specified args and kwargs. The result of the function will be cached
(so long as it can be represented in text form and parsed from its
**repr**) in a stamp file in the temporary files directory. The next
time the function is invoked through the ``jobstamp`` wrapper with the
same arguments, the result from the stampfile will be loaded and
returned directly.

Certain ``kwargs`` have special meanings and will be parsed and removed
from the ``kwargs`` passed to the underlying function. Those are:

- ``jobstamps_dependencies``: A list of files for which this function
depends on to produce its output. If any of these files have been
updated since the last invocation, the function will be run again.
- ``jobstamps_output_files``: A list of files for which this function
produces as a side-effect. If any of these files don't exist, the job
gets run again.
- ``jobstamps_cache_output_directory``: Where to store internal cached
invocation stamps. Usually this should be specified on a per-domain
basis to avoid clashes stamps in the global temporary files
directory.
- ``jobstamps_method``: Either one of ``jobstamp.HashMethod`` or
``jobstamp.MTimeMethod``, defaulting to the latter if left
unspecified. This option allows the user to pick the implementation
of determining whether a dependency is out of date.
``jobstamp.MTimeMethod`` uses the file-system modification time to
determine if a dependency is more recent than the last run of the
function. ``jobstamp.HashMethod`` uses the SHA1 algorithm to store a
hash of the file and compares the hash on the next invocation. It is
slower than ``jobstamp.MTimeMethod`` but handles cases where files
are copied or otherwise saved and restored between invocations.

Influential environment variables
---------------------------------

Specify ``JOBSTAMPS_DEBUG`` to see when a job was re-run or a cached
value was used.

Specify ``JOBSTAMPS_ALWAYS_USE_HASHES`` to force any underlying jobstamp
library to use ``jobstamp.HashMethod`` instead of
``jobstamp.MTimeMethod``, even if the user explicitly asked for the
latter. This is useful for CI environments where the latter method
almost never works the way one would expect it to.

.. |Travis| image:: https://img.shields.io/travis/polysquare/jobstamps.svg
:target:
.. |AppVeyor| image:: https://img.shields.io/appveyor/ci/polysquare/jobstamps.svg
:target:
.. |Coveralls| image:: https://img.shields.io/coveralls/polysquare/jobstamps.svg
:target:
.. |PyPI| image:: https://img.shields.io/pypi/v/jobstamps.svg
:target:
.. |PyPI| image:: https://img.shields.io/pypi/pyversions/jobstamps.svg
:target:
.. |License| image:: https://img.shields.io/github/license/polysquare/jobstamps.svg
:target:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobstamps-0.0.8.tar.gz (12.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page