Script to run command line programs in parallel
Python wrapper to execute command line programs in parallel using multiprocessing.Pool
The basic idea is that one does not have to change much to the original command line to run tasks in parallel.
The first argument is interpreted as the command to execute multiple times, all other arguments are used as the inputs to the command.
hello world: papply echo "Hello World" "Hello Papply"
compressing files with GNU gzip:
papply gzip *.txt
converting images to jpg using ImageMagick's convert:
convert 1.png 1.jpg, convert 2.png 2.jpg, convert 2.png 2.jpg, ...
papply "convert %F %f.jpg" *.png
The following strings can appear in the command string (the first argument to papply). They will be replaced by:
- %F: Full original input
- %d: directory name (no trailing slash)
- %f: file name with extension
- %n: file name without extension
- %e: extension (with leading .)
- %z: empty string
The basic configuration should work for most use-cases. I have not implemented command line options to papply itself on purpose. This is a choice to keep the interface as simple as possible.
If you want to override the default values you can create a papply.ini file. This file will be looked for first in the directory where you execute the command and second in your home directory. The following parameters can be changed:
- num_cores: number of cores to use, default all available cores
- overcommit_factor: how many threads per core to request, default 1
- show_progress: print progress of the running jobs (yes, no, IfLong), default show progress if running a long time
- long_duration: time in seconds to be considered long after which progress is shown, default 60
Any values that are not set explicitly will default to the standard value. There is an example ini file included in the distribution (papply.ini.example).
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.