Parallelize the execution of tasks with pytask.
Project description
pytask-parallel
Parallelize the execution of tasks with pytask-parallel
which is a plugin for
pytask.
Installation
pytask-parallel is available on PyPI and Anaconda.org. Install it with
$ pip install pytask-parallel
# or
$ conda install -c conda-forge pytask-parallel
By default, the plugin uses loky
's robust implementation of the ProcessPoolExecutor
.
It is also possible to select the ProcessPoolExecutor
or ThreadPoolExecutor
from the
concurrent.futures module
as backends to execute tasks asynchronously.
Usage
To parallelize your tasks across many workers, pass an integer greater than 1 or
'auto'
to the command-line interface.
$ pytask -n 2
$ pytask --n-workers 2
# Starts os.cpu_count() - 1 workers.
$ pytask -n auto
Using processes to parallelize the execution of tasks is useful for CPU bound tasks such as numerical computations. (Here is an explanation on what CPU or IO bound means.)
For IO bound tasks, tasks where the limiting factor are network responses, access to files, you can parallelize via threads.
$ pytask --parallel-backend threads
You can also set the options in a pyproject.toml
.
# This is the default configuration. Note that, parallelization is turned off.
[tool.pytask.ini_options]
n_workers = 1
parallel_backend = "loky" # or processes or threads
Some implementation details
Parallelization and Debugging
It is not possible to combine parallelization with debugging. That is why --pdb
or
--trace
deactivate parallelization.
If you parallelize the execution of your tasks using two or more workers, do not use
breakpoint()
or import pdb; pdb.set_trace()
since both will cause exceptions.
Threads and warnings
Capturing warnings is not thread-safe. Therefore, warnings cannot be captured reliably
when tasks are parallelized with --parallel-backend threads
.
Changes
Consult the release notes to find out about what is new.
Development
-
pytask-parallel
does not call thepytask_execute_task_protocol
hook specification/entry-point becausepytask_execute_task_setup
andpytask_execute_task
need to be separated frompytask_execute_task_teardown
. Thus, plugins which change this hook specification may not interact well with the parallelization. -
There are two PRs for CPython which try to re-enable setting custom reducers which should have been working, but does not. Here are the references.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pytask_parallel-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ded61572e8588434750212ab35b442dfbd4f6930869605bbfc79487d07b71516 |
|
MD5 | 354a26a88850d48612a07960f049a236 |
|
BLAKE2b-256 | 1726940f1ea1c1951f98e091b3ef80d17bb43377e0608a8c17175a314dfb93c4 |