Run R scripts with pytask.
Project description
pytask-r
Run R scripts with pytask.
Installation
pytask-r is available on PyPI and Anaconda.org. Install it with
$ pip install pytask-r
# or
$ conda install -c conda-forge pytask-r
You also need to have R installed and Rscript
on your command line. Test it by typing
the following on the command line
$ Rscript --help
If an error is shown instead of a help page, you can install R with conda
by choosing
either R or Microsoft R Open (MRO). Choose one of the two following commands. (See
here for
further explanation on Anaconda, R, and MRO.)
$ conda install -c r r-base # For normal R.
$ conda install -c r mro-base # For MRO.
Or install install R from the official R Project.
Usage
To create a task which runs a R script, define a task function with the @pytask.mark.r
decorator. The script
keyword provides an absolute path or path relative to the task
module to the R script.
import pytask
@pytask.mark.r(script="script.r")
@pytask.mark.produces("out.rds")
def task_run_r_script():
pass
If you are wondering why the function body is empty, know that pytask-r replaces the body with a predefined internal function. See the section on implementation details for more information.
Dependencies and Products
Dependencies and products can be added as with a normal pytask task using the
@pytask.mark.depends_on
and @pytask.mark.produces
decorators. which is explained in
this
tutorial.
Accessing dependencies and products in the script
To access the paths of dependencies and products in the script, pytask-r stores the
information by default in a .json
file. The path to this file is passed as a
positional argument to the script. Inside the script, you can read the information.
library(jsonlite)
args <- commandArgs(trailingOnly=TRUE)
path_to_json <- args[length(args)]
config <- read_json(path_to_json)
config$produces # Is the path to the output file "../out.csv".
The .json
file is stored in the same folder as the task in a .pytask
directory.
To parse the JSON file, you need to install jsonlite.
You can also pass any other information to your script by using the @task
decorator.
@task(kwargs={"number": 1})
@pytask.mark.r(script="script.r")
@pytask.mark.produces("out.rds")
def task_run_r_script():
pass
and inside the script use
config$number # Is 1.
Debugging
In case a task throws an error, you might want to execute the script independently from pytask. After a failed execution, you see the command which executed the R script in the report of the task. It looks roughly like this
$ Rscript <options> script.r <path-to>/.pytask/task_py_task_example.json
Command Line Arguments
The decorator can be used to pass command line arguments to Rscript
. See the following
example.
@pytask.mark.r(script="script.r", options="--vanilla")
@pytask.mark.produces("out.rds")
def task_run_r_script():
pass
Repeating tasks with different scripts or inputs
You can also repeat the execution of tasks, meaning executing multiple R scripts or passing different command line arguments to the same R script.
The following task executes two R scripts, script_1.r
and script_2.r
, which produce
different outputs.
for i in range(2):
@task
@pytask.mark.r(script=f"script_{i}.r")
@pytask.mark.produces(f"out_{i}.csv")
def task_execute_r_script():
pass
If you want to pass different inputs to the same R script, pass these arguments with the
kwargs
keyword of the @task
decorator.
for i in range(2):
@task(kwargs={"i": i})
@pytask.mark.r(script="script.r")
@pytask.mark.produces(f"output_{i}.csv")
def task_execute_r_script():
pass
and inside the task access the argument i
with
library(jsonlite)
args <- commandArgs(trailingOnly=TRUE)
path_to_json <- args[length(args)]
config <- read_json(path_to_json)
config$produces # Is the path to the output file "../output_{i}.csv".
config$i # Is the number.
Serializers
You can also serialize your data with any other tool you like. By default, pytask-r also supports YAML (if PyYaml is installed).
Use the serializer
keyword arguments of the @pytask.mark.r
decorator with
@pytask.mark.r(script="script.r", serializer="yaml")
def task_example():
...
And in your R script use
library(yaml)
args <- commandArgs(trailingOnly=TRUE)
config <- read_yaml(args[length(args)])
Note that the YAML
package needs to be installed.
If you need a custom serializer, you can also provide any callable to serializer
which
transforms data to a string. Use suffix
to set the correct file ending.
Here is a replication of the JSON example.
import json
@pytask.mark.r(script="script.r", serializer=json.dumps, suffix=".json")
def task_example():
...
Configuration
You can influence the default behavior of pytask-r with some configuration values.
r_serializer
Use this option to change the default serializer.
[tool.pytask.ini_options]
r_serializer = "json"
r_suffix
Use this option to set the default suffix of the file which contains serialized paths to dependencies and products and more.
[tool.pytask.ini_options]
r_suffix = ".json"
r_options
Use this option to set default options for each task which are separated by whitespace.
[tool.pytask.ini_options]
r_options = ["--vanilla"]
Changes
Consult the release notes to find out about what is new.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.