Python SDK and CLI for the Renku platform.
Project description
A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.
- NOTE:
renku-python is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.
Renku for Users
Installation
Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.
Prerequisites
Renku depends on Git under the hood, so make sure that you have Git installed on your system.
Renku also offers support to store large files in Git LFS, which is used by default and should be installed on your system. If you do not wish to use Git LFS, you can run Renku commands with the -S flag, as in renku -S <command>. More information on Git LFS usage in renku can be found in the Data in Renku section of the docs.
Renku uses CWL to execute recorded workflows when calling renku update or renku rerun. CWL depends on NodeJs to execute the workflows, so installing NodeJs is required if you want to use those features.
For development of the service, Docker is recommended.
pipx
First, install pipx and make sure that the $PATH is correctly configured.
$ python3 -m pip install --user pipx $ python3 -m pipx ensurepath
Once pipx is installed use following command to install renku.
$ pipx install renku $ which renku ~/.local/bin/renku
pipx installs Renku into its own virtual environment, making sure that it does not pollute any other packages or versions that you may have already installed.
To install a development release:
$ pipx install --pip-args pre renku
pip
$ pip install renku
The latest development versions are available on PyPI or from the Git repository:
$ pip install --pre renku # - OR - $ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku
Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.
Windows
Renku can be run using the Windows Subsystem for Linux (WSL). To install the WSL, please follow the official instructions.
We recommend you use the Ubuntu 20.04 image in the WSL when you get to that step of the installation.
Once WSL is installed, launch the WSL terminal and install the packages required by Renku with:
$ sudo apt-get update && sudo apt-get install git python3 python3-pip python3-venv pipx
Since Ubuntu has an older version of git LFS installed by default which is known to have some bugs when cloning repositories, we recommend you manually install the newest version by following these instructions.
Once all the requirements are installed, you can install Renku normally by running:
$ pipx install renku $ pipx ensurepath
After this, Renku is ready to use. You can access your Windows in the various mount points in /mnt/ and you can execute Windows executables (e.g. *.exe) as usual directly from the WSL (so renku run myexecutable.exe will work as expected).
Docker
The containerized version of the CLI can be launched using Docker command.
$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku
It makes sure your current directory is mounted to the same place in the container.
CLI Example
Initialize a Renku project:
$ mkdir -p ~/temp/my-renku-project $ cd ~/temp/my-renku-project $ renku init
Create a dataset and add data to it:
$ renku dataset create my-dataset $ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst
Run an analysis:
$ renku run wc < data/my-dataset/README.rst > wc_readme
Trace the data provenance:
$ renku log wc_readme
These are the basics, but there is much more that Renku allows you to do with your data analysis workflows. The full documentation will soon be available at: https://renku-python.readthedocs.io/
Renku as a Service
This repository includes a renku-core RPC service written as a Flask application that provides (almost) all of the functionality of the Renku CLI. This is used to provide one of the backends for the RenkuLab web UI. The service can be deployed in production as a Helm chart (see helm-chart.
Deploying locally
To test the service functionality you can deploy it quickly and easily using docker-compose up. Make sure to make a copy of the renku/service/.env-example file and configure it to your needs. The setup here is to expose the service behind a traefik reverse proxy to mimic an actual production deployment. You can access the proxied endpoints at http://localhost/api. The service itself is exposed on port 8080 so its endpoints are available directly under http://localhost:8080.
API Documentation
The renku core service implements the API documentation as an OpenAPI 3.0.x spec. You can retrieve the yaml of the specification itself with
` $ renku service apispec `
If deploying the service locally with docker-compose you can find the swagger-UI under localhost/api/swagger. To send the proper authorization headers to the service endpoints, click the Authorize button and enter a valid JWT token and a gitlab token with read/write repository scopes. The JWT token can be obtained by logging in to a renku instance with renku login and retrieving it from your local renku configuration.
In a live deployment, the swagger documentation is available under https://<renku-endpoint>/swagger. You can authorize the API by first logging into renku normally, then going to the swagger page, clicking Authorize and picking the oidc (OAuth2, authorization_code) option. Leave the client_id as swagger and the client_secret empty, select all scopes and click Authorize. You should now be logged in and you can send requests using the Try it out buttons on individual requests.
Developing Renku
For testing the functionality from source it is convenient to install renku in editable mode using pipx. Clone the repository and then do:
$ pipx install \ --editable \ <path-to-renku-python>[all] \ renku
This will install all the extras for testing and debugging.
Service
Developing the service and testing its APIs can be done with docker compose (see “Deploying Locally” above). To enable live reloading of the code, set the environment variable DEBUG_MODE=true either in your shell or in the .env file. Note that in this case the local directory is mounted in the docker container and renku is re-installed so it may take a few minutes before the container is ready.
Running tests
To run tests locally with specific version of Python:
$ pyenv install 3.7.5rc1 $ pipenv --python ~/.pyenv/versions/3.7.5rc1/bin/python install $ pipenv run tests
To recreate environment with different version of Python, it’s easy to do so with the following commands:
$ pipenv --rm $ pyenv install 3.6.9 $ pipenv --python ~/.pyenv/versions/3.6.9/bin/python install $ pipenv run tests
Using External Debuggers
Local Machine
To run renku via e.g. the Visual Studio Code debugger you need run it via the python executable in whatever virtual environment was used to install renku. If there is a package needed for the debugger, you need to inject it into the virtual environment first, e.g.:
$ pipx inject renku ptvsd
Finally, run renku via the debugger:
$ ~/.local/pipx/venvs/renku/bin/python -m ptvsd --host localhost --wait -m renku.cli <command>
If using Visual Studio Code, you may also want to set the Remote Attach configuration PathMappings so that it will find your source code, e.g.
{ "name": "Python: Remote Attach", "type": "python", "request": "attach", "port": 5678, "host": "localhost", "pathMappings": [ { "localRoot": "<path-to-renku-python-source-code>", "remoteRoot": "<path-to-renku-python-source-code>" } ] },
Kubernetes
- To debug a running renku-core service in a Kubernetes cluster, the service has to be deployed with the
core.debug flag set to true, like:
core: debug: true
Then install the Kubernetes extension and configure your local kubectl with the credentials needed for your cluster.
Add a .vscode/settings.json in the renku-python project root and set the following two values:
{ "vs-kubernetes": { "vs-kubernetes.python-autodetect-remote-root": true, "vs-kubernetes.python-remote-root": "/code/renku", } }
You might also need to run the Kubernetes: Use Namespace commandlet in VSCode to pick the correct Kubernetes namespace.
Once this is done, go to the Kubernetes tab in VSCode, right-click on your cluster -> Workloads -> Pods -> -renku-core- entry (not the -renku-core-redis- one) and pick Debug (attach), select core and python and you should be good to go.
You can also select Attach Visual Studio Code in the context menu to open a new instance of VSCode with write access to the source code in the remote pod.
Changes
0.16.4 (2022-01-25)
This is a backport release that contains changes necessary for 1.0.0 multi-service compatibility.
Bug Fixes
core: Fix different versions of core-service using same Job queues
0.16.3 (2022-01-21)
This is a backport release that contains changes necessary for 1.0.0 multi-service compatibility.
Bug Fixes
core: Fix for git requiring a merge strategy to be set
Features
core: Dataverse subject field support added
service: 1.0.0 style /version endpoint added
service: Made all endpoints versioned (v0.9)
0.16.2 (2021-10-05)
Bug Fixes
core: Pin pyshacl version to 0.17.0.post1
0.16.1 (2021-09-13)
Bug Fixes
core: Update to rdflib 6 and remove rdflib-jsonld which was not installable with setuptools>58.0.2
0.16.0 (2021-07-08)
Bug Fixes
cli: Fix Git LFS autocommit hook not committing new pointer files (#2139) (dca5aa4)
cli: prevent –template-ref from being set without –template-source in renku init (#2146) (e687b08)
core: add url validator utility function to fix an issue with URLs containing trailing slashes (#2050) (89f1c90),
core: fix checking out template repository by revision (#2189) (2a69aa2),
core: fix CWL to work with filenames with spaces (#2187) (634f2b3),
core: fix zenodo dataset import for datasets with schema:image set (#2142) (06d4969)
core: fix duplicate project version in flattened JSON-LD (#2087) (e28e308)
service: fix management jobs running into timeouts (#2127) (ab7ca08)
Features
0.15.1 (2021-05-20)
Bug Fixes
0.15.0 (2021-05-17)
Bug Fixes
Features
cli: improve feedback around files being overwritten by renku init and add –initial-branch flag (#1997) (50bb67b)
cli: add JSON output format to ‘renku dataset ls’ and ‘renku dataset ls-files’ (#2084) (514f13b)
cli: add OLOS export and improve import/export provider logic (#1857) (779c481)
cli: detect filename from content-disposition header when downloading (#2020) (c79ea14)
core: add default value to all Run parameters (#2057) (3a0321d)
core: adds node-js detection for rerun/update (#2002) (8b9e801)
core: add renku login command to authenticate with a renku deployment (#1864) (7f3039f)
dataset: add support to dataset update for detecting changes to local files (#2049) (71befe0)
service: pass gitlab token to core-service (#2062) (63c2675)
workflow: add naming metadata for command parameters (#2071) (b1e7a9b)
service: add delayed write operations, ie. porcelain and better cache management (#1957) (a05b615)
0.14.2 (2021-04-16)
Highlights
Ability to update local project from its template and to update the Dockerfile to install the current version of renku-python using renku migrate.
Support for Unicode paths in renku run (including emojis).
Bug Fixes
Features
0.14.1 (2021-03-24)
Bug Fixes
core: Add error handling if push of temporary branch fails (#1979) (f8d7285)
core: fix template update if same filename was added locally (#1974) (5b47ddc)
core: fixes save and push to correctly handle merge conflicts (#1925) (fdac171)
service: sync service cache with remote before operations to prevent cache getting out of sync (#1972) (34ec5d6)
Features
0.14.0 (2021-03-05)
Bug Fixes
core: call git commands for batches of files to prevent hitting argument length limits (#1893) (deaf055)
dataset: change renku dataset import to move temporary files and become more resilient to errors (#1894) (279407e)
service correctly address HTTP server errors (#1872) (2fd5052)
service correctly handle ref on project.clone (#1888) (7f30404)
service use project_id as part of project filesystem path (#1754) (391a14a)
Features
cli: add renku storage migrate command to migrate git files to lfs (#1869) (bed1358)
cli: add service component management commands (#1867) (928baf9)
core: exclude renku metadata from being added to git lfs (#1898) (8046edb)
core: add oauth authentication for KG access (#1881) (a568d31)
dataset: improve naming for imported datasets (#1900) (9beb654)
service: add helm 3 values schema to chart (#1835) (57f6aee)
service: add support for adding images to datasets (#1850) (c3caafd)
0.13.0 (2021-01-29)
Bug Fixes
Features
0.12.3 (2021-01-05)
Bug Fixes
0.12.2 (2020-12-02)
Bug Fixes
Features
0.12.1 (2020-11-16)
Bug Fixes
Features
0.12.0 (2020-11-03)
Bug Fixes
core: fix bug where remote_cache caused project ids to leak (#1618) (3ef04fb)
core: fix graph building for nodes with same subpath (#1625) (7cae9be)
core: fix importing a dataset referenced from non-existent projects (#1574) (92b8bf8)
core: fix old dataset migration and activity dataset outputs (#1603) (a5339e2)
core: fix project migration getting overwritten with old metadata (#1581) (c5a5960)
core: fix update creating a commit when showing help (#1627) (529e582)
core: fixes git encoding of paths with unicode characters (#1538) (053dac9)
core: make Run migration ids unique by relative path instead of absolute (#1573) (cf96310)
dataset: broken directory hierarchy after renku dataset imports (#1576) (9dcffce)
dataset: error when adding same file multiple times (#1639) (05bfde7)
dataset: explicit failure when cannot pull LFS objects (#1590) (3b05816)
dataset: invalid generated name in migration (#1593) (89b2e43)
dataset: update local files metadata when overwriting (#1582) (59eaf25)
service: dataset rm endpoint supports new core API (#1622) (e71916e)
service: raise exception on uninitialized projects (#1624) (a2025c3)
Features
0.11.6 (2020-10-16)
Bug Fixes
0.11.5 (2020-10-13)
Bug Fixes
core: fix importing a dataset referenced from non-existent projects (#1574) (4bb13ef)
core: fixes git encoding of paths with unicode characters (#1538) (9790707)
dataset: fix broken directory hierarchy after renku dataset imports (#1576) (41e3e72)
dataset: abort importing a dataset when cannot pull LFS objects (#1590) (9877a98)
dataset: fix invalid dataset name after migration (#1593) (c7ec249)
dataset: update dataset files metadata when adding and overwriting local files (#1582) (0a23e82)
0.11.4 (2020-10-05)
Bug Fixes
0.11.3 (2020-09-29)
Bug Fixes
core: make Run migration ids unique by relative path instead of absolute (686b9f9)
0.11.2 (2020-09-24)
Bug Fixes
Features
cli: show existing paths when initializing non-empty dir (#1535) (07c559f)
core: follow URL redirections for dataset files (#1516) (5a37b3c)
service: add additional template parameters (#1469) (6372a32)
service: adds additional fields to datasets listings (#1508) (f8a395f)
service: adds project details and renku operation on jobs endpoint (#1492) (6b3fafd)
service: execute read operations via git remote (#1488) (84a0eb3)
0.11.1 (2020-08-18)
Bug Fixes
0.11.0 (2020-08-14)
Bug Fixes
cli: disable version check in githook calls (#1300) (5132db3)
core: Only update project metadata if any migrations were executed (#1308) (1056a03)
service: adds more custom logging and imp. except handling (#1435) (6c3adb5)
service: fixes handlers for internal loggers (#1433) (a312f7c)
service: move project_id to query string on migrations check (#1367) (0f89726)
Features
cli: Show detailed commands for renku log output (#1345) (19fb819)
core: disabling of inputs/outputs auto-detection (#1406) (3245ca0)
core: Move workflow serialisation over to calamus (#1386) (f0fbc49)
service: added endpoints to execute all migrations on a project (#1322) (aca8cc2)
service: adds endpoint for explicit migrations check (#1326) (146b1a7)
service: adds source and destination versions to migrations check (#1372) (ea76b48)
service: adds endpoints for dataset remove (#1383) (289e4b9)
service: adds endpoints for unlinking files from a dataset (#1314) (1b78b16)
service: create new projects from templates (#1287) (552f85c), closes #862
0.10.5 (2020-07-16)
Bug Fixes
0.10.4 (2020-05-18)
Bug Fixes
Features
cli: Adds warning messages for LFS, fix output redirection (#1199) (31969f5)
core: Adds lfs file size limit and lfs ignore file (#1210) (1f3c81c)
core: git hook to avoid committing large files (#1238) (e8f1a8b)
core: renku doctor check for lfs migrate info (#1234) (480da06)
dataset: fail early when external storage not installed (#1239) (e6ea6da)
core: project clone API support for revision checkout (#1208) (74116e9)
dataset: no failure when adding ignored files (#1213) (b1e275f)
0.10.3 (2020-04-22)
Bug Fixes
Features
0.10.1 (2020-03-31)
Bug Fixes
Features
0.10.0 (2020-03-25)
This release brings about several important Dataset features:
importing renku datasets (#838)
working with data external to the repository (#974)
editing dataset metadata (#1111)
Please see the Dataset documentation for details.
Additional features were implemented for the backend service to facilitate a smoother user experience for dataset file manipulation.
IMPORTANT: starting with this version, a new metadata migration mechanism is in place (#1003). Renku commands will insist on migrating a project immediately if the metadata is found to be outdated.
Bug Fixes
Features
0.9.1 (2020-02-24)
Bug Fixes
Features
0.9.0 (2020-02-07)
Bug Fixes
adds git user check before running renku init (#892) (2e52dff)
Deletes temporary branch after renku init –force (#887) (eac0463)
Fixes JSON-LD translation and related issues (#846) (65e5469)
Fixes renku update workflow failure handling and renku status error handling (#888) (3879124)
Fixes sameAs property to follow schema.org spec (#944) (291380e)
Features
0.8.0 (2019-11-21)
Bug Fixes
Features
0.7.0 (2019-10-15)
Bug Fixes
0.6.1 (2019-10-10)
Bug Fixes
Features
0.6.0 (2019-09-18)
Bug Fixes
adds _label and commit data to imported dataset files, single commit for imports (#651) (75ce369)
always add commit to dataset if possible (#648) (7659bc8), closes #646
cleanup needed for integration tests on py35 (#653) (fdd7215)
fixed serialization of datetime to iso format (#629) (693d59d)
hide image, pull, runner, show, workon and deactivate commands (#672) (a3e9998)
Removes unneccesary call to git lfs with no paths (#658) (e32d48b)
use latest_html for version check (#647) (c6b0309), closes #641
zenodo export failing with relative paths (d40967c)
Features
0.5.2 (2019-07-26)
Bug Fixes
Features
0.5.1 (2019-07-12)
Bug Fixes
ensure external storage is handled correctly (#592) (7938ac4)
cli: allow renku run with many inputs (f60783e), closes #552
modify json-ld for datasets (#534) (ab6a719), closes #525 #526
refactored tests and docs to align with updated pydoctstyle (#586) (6f981c8)
cli: add check of missing references (9a373da)
cli: fail when removing non existing dataset (dd728db)
status: fix renku status output when not in root folder (#564) (873270d), closes #551
datasets: strip query string from data filenames (450898b)
cli: remove dataset aliases (6206e62)
cwl: detect script as input parameter (e23b75a), closes #495
deps: updated dependencies (691644d)
Features
added support for working on dirty repo (ae67be7)
0.5.0 (2019-03-28)
Bug Fixes
Features
api: list datasets from a commit (04a9fe9)
cli: add dataset rm command (a70c7ce)
cli: add rm command (cf0f502)
cli: configurable format of dataset output (d37abf3)
dataset: add existing file from current repo (575686b), closes #99
datasets: added ls-files command (ccc4f59)
models: reference context for relative paths (5d1e8e7), closes #452
add JSON-LD output format for datasets (c755d7b), closes #426
generate Makefile with log –format Makefile (1e440ce)
v0.4.0
(released 2019-03-05)
Adds renku mv command which updates dataset metadata, .gitattributes and symlinks.
Pulls LFS objects from submodules correctly.
Adds listing of datasets.
Adds reduced dot format for renku log.
Adds doctor command to check missing files in datasets.
Moves dataset metadata to .renku/datasets and adds migrate datasets command and uses UUID for metadata path.
Gets git attrs for files to prevent duplicates in .gitattributes.
Fixes renku show outputs for directories.
Runs Git LFS checkout in a worktrees and lazily pulls necessary LFS files before running commands.
Asks user before overriding an existing file using renku init or renku runner template.
Fixes renku init --force in an empty dir.
Renames CommitMixin._location to _project.
Addresses issue with commits editing multiple CWL files.
Exports merge commits for full lineage.
Exports path and parent directories.
Adds an automatic check for the latest version.
Simplifies issue submission from traceback to GitHub or Sentry. Requires SENTRY_DSN variable to be set and sentry-sdk package to be installed before sending any data.
Removes outputs before run.
Allows update of directories.
Improves readability of the status message.
Checks ignored path when added to a dataset.
Adds API method for finding ignored paths.
Uses branches for init --force.
Fixes CVE-2017-18342.
Fixes regex for parsing Git remote URLs.
Handles --isolation option using git worktree.
Renames client.git to client.repo.
Supports python -m renku.
Allows ‘.’ and ‘-’ in repo path.
v0.3.3
(released 2018-12-07)
Fixes generated Homebrew formula.
Renames renku pull path to renku storage pull with deprecation warning.
v0.3.2
(released 2018-11-29)
Fixes display of workflows in renku log.
v0.3.1
(released 2018-11-29)
Fixes issues with parsing remote Git URLs.
v0.3.0
(released 2018-11-26)
Adds JSON-LD context to objects extracted from the Git repository (see renku show context --list).
Uses PROV-O and WFPROV as provenance vocabularies and generates “stable” object identifiers (@id) for RDF and JSON-LD output formats.
Refactors the log output to allow linking files and directories.
Adds support for aliasing tools and workflows.
Adds option to install shell completion (renku --install-completion).
Fixes initialization of Git submodules.
Uses relative submodule paths when appropriate.
Simplifies external storage configuration.
v0.2.0
(released 2018-09-25)
Refactored version using Git and Common Workflow Language.
v0.1.0
(released 2017-09-06)
Initial public release as Renga.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file renku-0.16.4.tar.gz
.
File metadata
- Download URL: renku-0.16.4.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67aec3745205a968f03f3089672059c57fdd7fd21c1c3803d480c8ffde902c0b |
|
MD5 | c6a0ee7b4fa02c935ec2e707523d9029 |
|
BLAKE2b-256 | 478e4e9d4e78f9463949e6eda60ec198c5fdee1b9fb555d8cb85e7d2d276d993 |
File details
Details for the file renku-0.16.4-py2.py3-none-any.whl
.
File metadata
- Download URL: renku-0.16.4-py2.py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fbf6b7282f84ef3707e68aabbfc78912998874a9d40b8a474d5a7fe27def6db |
|
MD5 | b24e93d120c255615e1e346e0ef47cc3 |
|
BLAKE2b-256 | a25ab339145190c6c60f9c80b95f4985f3e49b3c7f279f8aa08b3665546e1680 |