Skip to main content

Python SDK and CLI for the Renku platform.

Project description

https://github.com/SwissDataScienceCenter/renku-python/workflows/Test,%20Integration%20Tests%20and%20Deploy/badge.svg https://img.shields.io/coveralls/SwissDataScienceCenter/renku-python.svg https://img.shields.io/github/tag/SwissDataScienceCenter/renku-python.svg https://img.shields.io/pypi/dm/renku.svg Documentation Status https://img.shields.io/github/license/SwissDataScienceCenter/renku-python.svg

A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.

NOTE:

renku-python is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.

Renku for Users

Installation

Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.

Prerequisites

Renku depends on Git under the hood, so make sure that you have Git installed on your system.

Renku also offers support to store large files in Git LFS, which is used by default and should be installed on your system. If you do not wish to use Git LFS, you can run Renku commands with the -S flag, as in renku -S <command>. More information on Git LFS usage in renku can be found in the Data in Renku section of the docs.

Renku uses CWL to execute recorded workflows when calling renku update or renku rerun. CWL depends on NodeJs to execute the workflows, so installing NodeJs is required if you want to use those features.

For development of the service, Docker is recommended.

pipx

First, install pipx and make sure that the $PATH is correctly configured.

$ python3 -m pip install --user pipx
$ python3 -m pipx ensurepath

Once pipx is installed use following command to install renku.

$ pipx install renku
$ which renku
~/.local/bin/renku

pipx installs Renku into its own virtual environment, making sure that it does not pollute any other packages or versions that you may have already installed.

To install a development release:

$ pipx install --pip-args pre renku

pip

$ pip install renku

The latest development versions are available on PyPI or from the Git repository:

$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku

Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.

Windows

Renku can be run using the Windows Subsystem for Linux (WSL). To install the WSL, please follow the official instructions.

We recommend you use the Ubuntu 20.04 image in the WSL when you get to that step of the installation.

Once WSL is installed, launch the WSL terminal and install the packages required by Renku with:

$ sudo apt-get update && sudo apt-get install git python3 python3-pip python3-venv pipx

Since Ubuntu has an older version of git LFS installed by default which is known to have some bugs when cloning repositories, we recommend you manually install the newest version by following these instructions.

Once all the requirements are installed, you can install Renku normally by running:

$ pipx install renku
$ pipx ensurepath

After this, Renku is ready to use. You can access your Windows in the various mount points in /mnt/ and you can execute Windows executables (e.g. \*.exe) as usual directly from the WSL (so renku run myexecutable.exe will work as expected).

Docker

The containerized version of the CLI can be launched using Docker command.

$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku

It makes sure your current directory is mounted to the same place in the container.

CLI Example

Initialize a Renku project:

$ mkdir -p ~/temp/my-renku-project
$ cd ~/temp/my-renku-project
$ renku init

Create a dataset and add data to it:

$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst

Run an analysis:

$ renku run --name my-workflow -- wc < data/my-dataset/README.rst > wc_readme

Trace the data provenance:

$ renku workflow visualize wc_readme

These are the basics, but there is much more that Renku allows you to do with your data analysis workflows. The full documentation will soon be available at: https://renku-python.readthedocs.io/

Renku as a Service

This repository includes a renku-core RPC service written as a Flask application that provides (almost) all of the functionality of the Renku CLI. This is used to provide one of the backends for the RenkuLab web UI. The service can be deployed in production as a Helm chart (see helm-chart.

Deploying locally

To test the service functionality you can deploy it quickly and easily using docker-compose up [docker-compose](https://pypi.org/project/docker-compose/). Make sure to make a copy of the renku/service/.env-example file and configure it to your needs. The setup here is to expose the service behind a traefik reverse proxy to mimic an actual production deployment. You can access the proxied endpoints at http://localhost/api. The service itself is exposed on port 8080 so its endpoints are available directly under http://localhost:8080.

API Documentation

The renku core service implements the API documentation as an OpenAPI 3.0.x spec. You can retrieve the yaml of the specification itself with

` $ renku service apispec `

If deploying the service locally with docker-compose you can find the swagger-UI under localhost/api/swagger. To send the proper authorization headers to the service endpoints, click the Authorize button and enter a valid JWT token and a gitlab token with read/write repository scopes. The JWT token can be obtained by logging in to a renku instance with renku login and retrieving it from your local renku configuration.

In a live deployment, the swagger documentation is available under https://<renku-endpoint>/swagger. You can authorize the API by first logging into renku normally, then going to the swagger page, clicking Authorize and picking the oidc (OAuth2, authorization_code) option. Leave the client_id as swagger and the client_secret empty, select all scopes and click Authorize. You should now be logged in and you can send requests using the Try it out buttons on individual requests.

Developing Renku

For testing the functionality from source it is convenient to install renku in editable mode using pipx. Clone the repository and then do:

$ pipx install \
    --editable \
    <path-to-renku-python>[all] \
    renku

This will install all the extras for testing and debugging.

If you already use pyenv to manage different python versions, you may be interested in installing pyenv-virtualenv to create an ad-hoc virtual environment for developing renku.

Once you have created and activated a virtual environment for renku-python, you can use the usual pip commands to install the required dependencies.

$ pip install -e .[all]  # use `.[all]` for zsh

Service

Developing the service and testing its APIs can be done with docker compose (see “Deploying Locally” above). To enable live reloading of the code, set the environment variable DEBUG_MODE=true either in your shell or in the .env file. Note that in this case the local directory is mounted in the docker container and renku is re-installed so it may take a few minutes before the container is ready.

If you have a full RenkuLab deployment at your disposal, you can use telepresence v1 to develop and debug locally. Just run the start-telepresence.sh script and follow the instructions. You can also attach a remote debugger using the “remote attach” method described later. Mind that the script doesn’t work with telepresence v2.

Running tests

We use pytest for running tests. You can use our run-tests.sh script for running specific set of tests.

$ ./run-tests.sh -h

We lint the files using black and isort.

Using External Debuggers

Local Machine

To run renku via e.g. the Visual Studio Code debugger you need run it via the python executable in whatever virtual environment was used to install renku. If there is a package needed for the debugger, you need to inject it into the virtual environment first, e.g.:

$ pipx inject renku ptvsd

Finally, run renku via the debugger:

$ ~/.local/pipx/venvs/renku/bin/python -m ptvsd --host localhost --wait -m renku.cli <command>

If using Visual Studio Code, you may also want to set the Remote Attach configuration PathMappings so that it will find your source code, e.g.

{
    "name": "Python: Remote Attach",
    "type": "python",
    "request": "attach",
    "port": 5678,
    "host": "localhost",
    "pathMappings": [
        {
            "localRoot": "<path-to-renku-python-source-code>",
            "remoteRoot": "<path-to-renku-python-source-code>"
        }
    ]
}

Kubernetes

To debug a running renku-core service in a Kubernetes cluster, the service has to be deployed with the

core.debug flag set to true, like:

core:
  debug: true

Also, if you want to be able to modify the files remotely, you need to change the security context on the deployment.yaml file for the renku-core component from runAsUser: 1000 to runAsGroup: 2000.

Then install the Kubernetes extension and configure your local kubectl with the credentials needed for your cluster.

Add a .vscode/settings.json in the renku-python project root and set the following two values:

{
    "vs-kubernetes": {
        "vs-kubernetes.python-autodetect-remote-root": true,
        "vs-kubernetes.python-remote-root": "/code/renku",
    }
}

You might also need to run the Kubernetes: Use Namespace commandlet in VSCode to pick the correct Kubernetes namespace.

Once this is done, go to the Kubernetes tab in VSCode, right-click on your cluster -> Workloads -> Pods -> -renku-core- entry (not the -renku-core-redis- one) and pick Debug (attach), select core and python and you should be good to go.

You can also select Attach Visual Studio Code in the context menu to open a new instance of VSCode with write access to the source code in the remote pod.

Project details


Release history Release notifications | RSS feed

This version

1.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

renku-1.0.0.tar.gz (405.1 kB view details)

Uploaded Source

Built Distribution

renku-1.0.0-cp37-cp37m-manylinux_2_33_x86_64.whl (714.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.33+ x86-64

File details

Details for the file renku-1.0.0.tar.gz.

File metadata

  • Download URL: renku-1.0.0.tar.gz
  • Upload date:
  • Size: 405.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.7.8 Linux/5.15.5-arch1-1

File hashes

Hashes for renku-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a6ca886a6ab0f05663f14099d13edcd1f376be0b299d0529d195be246246eb78
MD5 2661c3f26933f6f37fe00c0d2e331112
BLAKE2b-256 106ec9d14407e7e738219c2ade3c2541aebd14f86f37ca108c598404747edcd0

See more details on using hashes here.

File details

Details for the file renku-1.0.0-cp37-cp37m-manylinux_2_33_x86_64.whl.

File metadata

  • Download URL: renku-1.0.0-cp37-cp37m-manylinux_2_33_x86_64.whl
  • Upload date:
  • Size: 714.8 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.33+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.7.8 Linux/5.15.5-arch1-1

File hashes

Hashes for renku-1.0.0-cp37-cp37m-manylinux_2_33_x86_64.whl
Algorithm Hash digest
SHA256 c1403f30bd99507f0fdcf0848c3e9a86b5a3d055b36646ccfcb137536790413a
MD5 e678224f762aa177ff5d404261d5d8b4
BLAKE2b-256 99807e4810cfe58bccb111cbcf189d2bbf4b08a9fdd13cb227a05d408d2443a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page