Skip to main content

Run R scripts with pytask.

Project description

PyPI PyPI - Python Version https://img.shields.io/conda/vn/conda-forge/pytask-r.svg https://img.shields.io/conda/pn/conda-forge/pytask-r.svg PyPI - License https://img.shields.io/github/workflow/status/pytask-dev/pytask-r/Continuous%20Integration%20Workflow/main https://codecov.io/gh/pytask-dev/pytask-r/branch/main/graph/badge.svg pre-commit.ci status https://img.shields.io/badge/code%20style-black-000000.svg

pytask-r

Run R scripts with pytask.

Installation

pytask-r is available on PyPI and Anaconda.org. Install it with

$ pip install pytask-r

# or

$ conda install -c conda-forge pytask-r

You also need to have R installed and Rscript on your command line. Test it by typing the following on the command line

$ Rscript --help

If an error is shown instead of a help page, you can install R with conda by choosing either R or Microsoft R Open (MRO). Choose one of the two following commands. (See here for further explanation on Anaconda, R, and MRO.)

$ conda install -c r r-base     # For normal R.
$ conda install -c r mro-base   # For MRO.

Or install install R from the official R Project.

Usage

Similarly to normal task functions which execute Python code, you define tasks to execute scripts written in R with Python functions. The difference is that the function body does not contain any logic, but the decorator tells pytask how to handle the task.

Here is an example where you want to run script.r.

import pytask


@pytask.mark.r
@pytask.mark.depends_on("script.r")
@pytask.mark.produces("out.rds")
def task_run_r_script():
    pass

Note that, you need to apply the @pytask.mark.r marker so that pytask-r handles the task.

If you are wondering why the function body is empty, know that pytask-r replaces the body with a predefined internal function. See the section on implementation details for more information.

Multiple dependencies and products

What happens if a task has more dependencies? Using a list, the R script which should be executed must be found in the first position of the list.

@pytask.mark.r
@pytask.mark.depends_on(["script.r", "input.rds"])
@pytask.mark.produces("out.rds")
def task_run_r_script():
    pass

If you use a dictionary to pass dependencies to the task, pytask-r will, first, look for a "source" key in the dictionary and, secondly, under the key 0.

@pytask.mark.r
@pytask.mark.depends_on({"source": "script.r", "input": "input.rds"})
def task_run_r_script():
    pass


# or


@pytask.mark.r
@pytask.mark.depends_on({0: "script.r", "input": "input.rds"})
def task_run_r_script():
    pass


# or two decorators for the function, if you do not assign a name to the input.


@pytask.mark.r
@pytask.mark.depends_on({"source": "script.r"})
@pytask.mark.depends_on("input.rds")
def task_run_r_script():
    pass

Command Line Arguments

The decorator can be used to pass command line arguments to Rscript. See the following example.

@pytask.mark.r("value")
@pytask.mark.depends_on("script.r")
@pytask.mark.produces("out.rds")
def task_run_r_script():
    pass

And in your script.r, you can intercept the value with

args <- commandArgs(trailingOnly=TRUE)
arg <- args[1]  # holds ``"value"``

Parametrization

You can also parametrize the execution of scripts, meaning executing multiple R scripts as well as passing different command line arguments to the same R script.

The following task executes two R scripts which produce different outputs.

from src.config import BLD, SRC


@pytask.mark.r
@pytask.mark.parametrize(
    "depends_on, produces",
    [(SRC / "script_1.r", BLD / "1.rds"), (SRC / "script_2.r", BLD / "2.rds")],
)
def task_execute_r_script():
    pass

And the R script includes something like

args <- commandArgs(trailingOnly=TRUE)
produces <- args[1]  # holds the path

If you want to pass different command line arguments to the same R script, you have to include the @pytask.mark.r decorator in the parametrization just like with @pytask.mark.depends_on and @pytask.mark.produces.

@pytask.mark.depends_on("script.r")
@pytask.mark.parametrize(
    "produces, r",
    [(BLD / "output_1.rds", "1"), (BLD / "output_2.rds", "2")],
)
def task_execute_r_script():
    pass

Configuration

If you want to change the name of the key which identifies the R script, change the following default configuration in your pytask configuration file.

r_source_key = source

Implementation Details

The plugin is a convenient wrapper around

import subprocess

subprocess.run(["Rscript", "script.r"], check=True)

to which you can always resort to when the plugin does not deliver functionality you need.

It is not possible to enter a post-mortem debugger when an error happens in the R script or enter the debugger when starting the script. If there exists a solution for that, hints as well as contributions are highly appreciated.

Changes

Consult the release notes to find out about what is new.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytask_r-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytask_r-0.1.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file pytask_r-0.1.0.tar.gz.

File metadata

  • Download URL: pytask_r-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pytask_r-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ba225931038050f9ba89376c60844f036b1782b05ee4fd6fda8d1343902cdd6
MD5 fd598eef6e9cde52c4c3c398f4b885af
BLAKE2b-256 a50c872cf4dd88a518b35c44d36997e5508ef77840e82a211c28590cb0980d9d

See more details on using hashes here.

File details

Details for the file pytask_r-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytask_r-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pytask_r-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 212d745fae5c018a0b95e393a3cb6f0a2bb76781cdcee29029f3ea7faeb97811
MD5 c4b854b34c97e1081993d1300a9cd50b
BLAKE2b-256 649a668fef11468bed50a66389f8abc06bb2ffbf55ea3376a0ef851b6f877941

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page