No project description provided
Project description
Lightweight pipelining: using Python functions as pipeline jobs.
Home-page: http://pythonhosted.org/joblib/
Author: Gael Varoquaux
Author-email: gael.varoquaux@normalesup.org
License: BSD
Description: Joblib is a set of tools to provide **lightweight pipelining in
Python**. In particular, joblib offers:
1. transparent disk-caching of the output values and lazy re-evaluation
(memoize pattern)
2. easy simple parallel computing
3. logging and tracing of the execution
Joblib is optimized to be **fast** and **robust** in particular on large
data and has specific optimizations for `numpy` arrays. It is
**BSD-licensed**.
========================= ================================================
**User documentation:** http://pythonhosted.org/joblib
**Download packages:** http://pypi.python.org/pypi/joblib#downloads
**Source code:** http://github.com/joblib/joblib
**Report issues:** http://github.com/joblib/joblib/issues
========================= ================================================
Vision
--------
The vision is to provide tools to easily achieve better performance and
reproducibility when working with long running jobs.
* **Avoid computing twice the same thing**: code is rerun over an
over, for instance when prototyping computational-heavy jobs (as in
scientific development), but hand-crafted solution to alleviate this
issue is error-prone and often leads to unreproducible results
* **Persist to disk transparently**: persisting in an efficient way
arbitrary objects containing large data is hard. Using
joblib's caching mechanism avoids hand-written persistence and
implicitly links the file on disk to the execution context of
the original Python object. As a result, joblib's persistence is
good for resuming an application status or computational job, eg
after a crash.
Joblib strives to address these problems while **leaving your code and
your flow control as unmodified as possible** (no framework, no new
paradigms).
Main features
------------------
1) **Transparent and fast disk-caching of output value:** a memoize or
make-like functionality for Python functions that works well for
arbitrary Python objects, including very large numpy arrays. Separate
persistence and flow-execution logic from domain logic or algorithmic
code by writing the operations as a set of steps with well-defined
inputs and outputs: Python functions. Joblib can save their
computation to disk and rerun it only if necessary::
>>> from joblib import Memory
>>> mem = Memory(cachedir='/tmp/joblib')
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> square = mem.cache(np.square)
>>> b = square(a) # doctest: +ELLIPSIS
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0., 0., 1.],
[ 1., 1., 1.],
[ 4., 2., 1.]]))
___________________________________________________________square - 0...s, 0.0min
>>> c = square(a)
>>> # The above call did not trigger an evaluation
2) **Embarrassingly parallel helper:** to make it easy to write readable
parallel code and debug it quickly::
>>> from joblib import Parallel, delayed
>>> from math import sqrt
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
3) **Logging/tracing:** The different functionalities will
progressively acquire better logging mechanism to help track what
has been ran, and capture I/O easily. In addition, Joblib will
provide a few I/O primitives, to easily define logging and
display streams, and provide a way of compiling a report.
We want to be able to quickly inspect what has been run.
4) **Fast compressed Persistence**: a replacement for pickle to work
efficiently on Python objects containing large data (
*joblib.dump* & *joblib.load* ).
..
>>> import shutil ; shutil.rmtree('/tmp/joblib/')
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Utilities
Classifier: Topic :: Software Development :: Libraries
Home-page: http://pythonhosted.org/joblib/
Author: Gael Varoquaux
Author-email: gael.varoquaux@normalesup.org
License: BSD
Description: Joblib is a set of tools to provide **lightweight pipelining in
Python**. In particular, joblib offers:
1. transparent disk-caching of the output values and lazy re-evaluation
(memoize pattern)
2. easy simple parallel computing
3. logging and tracing of the execution
Joblib is optimized to be **fast** and **robust** in particular on large
data and has specific optimizations for `numpy` arrays. It is
**BSD-licensed**.
========================= ================================================
**User documentation:** http://pythonhosted.org/joblib
**Download packages:** http://pypi.python.org/pypi/joblib#downloads
**Source code:** http://github.com/joblib/joblib
**Report issues:** http://github.com/joblib/joblib/issues
========================= ================================================
Vision
--------
The vision is to provide tools to easily achieve better performance and
reproducibility when working with long running jobs.
* **Avoid computing twice the same thing**: code is rerun over an
over, for instance when prototyping computational-heavy jobs (as in
scientific development), but hand-crafted solution to alleviate this
issue is error-prone and often leads to unreproducible results
* **Persist to disk transparently**: persisting in an efficient way
arbitrary objects containing large data is hard. Using
joblib's caching mechanism avoids hand-written persistence and
implicitly links the file on disk to the execution context of
the original Python object. As a result, joblib's persistence is
good for resuming an application status or computational job, eg
after a crash.
Joblib strives to address these problems while **leaving your code and
your flow control as unmodified as possible** (no framework, no new
paradigms).
Main features
------------------
1) **Transparent and fast disk-caching of output value:** a memoize or
make-like functionality for Python functions that works well for
arbitrary Python objects, including very large numpy arrays. Separate
persistence and flow-execution logic from domain logic or algorithmic
code by writing the operations as a set of steps with well-defined
inputs and outputs: Python functions. Joblib can save their
computation to disk and rerun it only if necessary::
>>> from joblib import Memory
>>> mem = Memory(cachedir='/tmp/joblib')
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(np.float)
>>> square = mem.cache(np.square)
>>> b = square(a) # doctest: +ELLIPSIS
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0., 0., 1.],
[ 1., 1., 1.],
[ 4., 2., 1.]]))
___________________________________________________________square - 0...s, 0.0min
>>> c = square(a)
>>> # The above call did not trigger an evaluation
2) **Embarrassingly parallel helper:** to make it easy to write readable
parallel code and debug it quickly::
>>> from joblib import Parallel, delayed
>>> from math import sqrt
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
3) **Logging/tracing:** The different functionalities will
progressively acquire better logging mechanism to help track what
has been ran, and capture I/O easily. In addition, Joblib will
provide a few I/O primitives, to easily define logging and
display streams, and provide a way of compiling a report.
We want to be able to quickly inspect what has been run.
4) **Fast compressed Persistence**: a replacement for pickle to work
efficiently on Python objects containing large data (
*joblib.dump* & *joblib.load* ).
..
>>> import shutil ; shutil.rmtree('/tmp/joblib/')
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Utilities
Classifier: Topic :: Software Development :: Libraries
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
joblib-0.11a3.tar.gz
(213.0 kB
view hashes)
Built Distribution
joblib-0.11a3-py2.py3-none-any.whl
(177.0 kB
view hashes)
Close
Hashes for joblib-0.11a3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 859ea5aede143aec9628786732c40818423d8d59b9aab37728223867abdc2ed7 |
|
MD5 | 9638e21ae8f969af9510bd23bd9be03d |
|
BLAKE2b-256 | c7a73081bd2037c399482211204994dbb9847c8bf0ce76f182c497fe151cb977 |