Skip to main content

Python SDK and CLI for the Renku platform.

Project description

https://github.com/SwissDataScienceCenter/renku-python/workflows/Test,%20Integration%20Tests%20and%20Deploy/badge.svg https://img.shields.io/coveralls/SwissDataScienceCenter/renku-python.svg https://img.shields.io/github/tag/SwissDataScienceCenter/renku-python.svg https://img.shields.io/pypi/dm/renku.svg Documentation Status https://img.shields.io/github/license/SwissDataScienceCenter/renku-python.svg Pull reminders

A Python library for the Renku collaborative data science platform. It includes a CLI and SDK for end-users as well as a service backend. It provides functionality for the creation and management of projects and datasets, and simple utilities to capture data provenance while performing analysis tasks.

NOTE:

renku-python is the python library and core service for Renku - it does not start the Renku platform itself - for that, refer to the Renku docs on running the platform.

Renku for Users

Installation

Renku releases and development versions are available from PyPI. You can install it using any tool that knows how to handle PyPI packages. Our recommendation is to use :code:pipx.

pipx

First, install pipx and make sure that the $PATH is correctly configured.

$ python3 -m pip install --user pipx
$ pipx ensurepath

Once pipx is installed use following command to install renku.

$ pipx install renku
$ which renku
~/.local/bin/renku

pipx installs renku into its own virtual environment, making sure that it does not pollute any other packages or versions that you may have already installed.

To install a development release:

$ pipx install --pip-args pre renku

pip

$ pip install renku

The latest development versions are available on PyPI or from the Git repository:

$ pip install --pre renku
# - OR -
$ pip install -e git+https://github.com/SwissDataScienceCenter/renku-python.git#egg=renku

Use following installation steps based on your operating system and preferences if you would like to work with the command line interface and you do not need the Python library to be importable.

Docker

The containerized version of the CLI can be launched using Docker command.

$ docker run -it -v "$PWD":"$PWD" -w="$PWD" renku/renku-python renku

It makes sure your current directory is mounted to the same place in the container.

CLI Example

Initialize a renku project:

$ mkdir -p ~/temp/my-renku-project
$ cd ~/temp/my-renku-project
$ renku init

Create a dataset and add data to it:

$ renku dataset create my-dataset
$ renku dataset add my-dataset https://raw.githubusercontent.com/SwissDataScienceCenter/renku-python/master/README.rst

Run an analysis:

$ renku run wc < data/my-dataset/README.rst > wc_readme

Trace the data provenance:

$ renku log wc_readme

These are the basics, but there is much more that Renku allows you to do with your data analysis workflows. The full documentation will soon be available at: https://renku-python.readthedocs.io/

Renku as a Service

This repository includes a renku-core RPC service written as a Flask application that provides (almost) all of the functionality of the Renku CLI. This is used to provide one of the backends for the RenkuLab web UI. The service can be deployed in production as a Helm chart (see helm-chart.

Developing Renku

For testing the functionality from source it is convenient to install renku in editable mode using pipx. Clone the repository and then do:

$ pipx install \
    --editable \
    <path-to-renku-python>[all] \
    renku

This will install all the extras for testing and debugging.

Running tests

To run tests locally with specific version of Python:

$ pyenv install 3.7.5rc1
$ pipenv --python ~/.pyenv/versions/3.7.5rc1/bin/python install
$ pipenv run tests

To recreate environment with different version of Python, it’s easy to do so with the following commands:

$ pipenv --rm
$ pyenv install 3.6.9
$ pipenv --python ~/.pyenv/versions/3.6.9/bin/python install
$ pipenv run tests

Using External Debuggers

To run renku via e.g. the Visual Studio Code debugger you need run it via the python executable in whatever virtual environment was used to install renku. If there is a package needed for the debugger, you need to inject it into the virtual environment first, e.g.:

$ pipx inject renku ptvsd

Finally, run renku via the debugger:

$ ~/.local/pipx/venvs/renku/bin/python -m ptvsd --host localhost --wait -m renku.cli <command>

If using Visual Studio Code, you may also want to set the Remote Attach configuration PathMappings so that it will find your source code, e.g.

{
        "name": "Python: Remote Attach",
        "type": "python",
        "request": "attach",
        "port": 5678,
        "host": "localhost",
        "pathMappings": [
            {
                "localRoot": "<path-to-renku-python-source-code>",
                "remoteRoot": "<path-to-renku-python-source-code>"
            }
        ]
    },

Changes

0.10.5 (2020-07-16)

Bug Fixes

  • core: Pin dependencies to prevent downstream dependency updates from breaking renku. Fix pyshacl dependency. (#785) (30beedd)

  • core: Fixes SoftwareAgent person context. (#1323) (fa62f58)

0.10.4 (2020-05-18)

Bug Fixes

  • dataset: update default behaviour and messaging on dataset unlink (#1275) (98d6728)

  • dataset: correct url in different domain (#1211) (49e8b8b)

Features

  • cli: Adds warning messages for LFS, fix output redirection (#1199) (31969f5)

  • core: Adds lfs file size limit and lfs ignore file (#1210) (1f3c81c)

  • core: Adds renku storage clean command (#1235) (7029400)

  • core: git hook to avoid committing large files (#1238) (e8f1a8b)

  • core: renku doctor check for lfs migrate info (#1234) (480da06)

  • dataset: fail early when external storage not installed (#1239) (e6ea6da)

  • core: project clone API support for revision checkout (#1208) (74116e9)

  • service: protected branches support (#1222) (8405ce5)

  • dataset: doi variations for import (#1216) (0f329dd)

  • dataset: keywords in metadata (#1209) (f98a800)

  • dataset: no failure when adding ignored files (#1213) (b1e275f)

  • service: read template manifest (#1254) (7eac85b)

0.10.3 (2020-04-22)

Bug Fixes

Features

0.10.1 (2020-03-31)

Bug Fixes

Features

0.10.0 (2020-03-25)

This release brings about several important Dataset features:

  • importing renku datasets (#838)

  • working with data external to the repository (#974)

  • editing dataset metadata (#1111)

Please see the Dataset documentation for details.

Additional features were implemented for the backend service to facilitate a smoother user experience for dataset file manipulation.

IMPORTANT: starting with this version, a new metadata migration mechanism is in place (#1003). Renku commands will insist on migrating a project immediately if the metadata is found to be outdated.

Bug Fixes

  • cli: consistenly show correct contexts (#1096) (b333f0f)

  • dataset: –no-external-storage flag not working (#1130) (c183e97)

  • dataset: commit only updated dataset files (#1116) (d9739df)

  • datasets: fixed importing large amount of small files (#1119) (8d61473)

  • datasets: raises correct error message on import of protected dataset (#1112) (e579904)

Features

0.9.1 (2020-02-24)

Bug Fixes

Features

0.9.0 (2020-02-07)

Bug Fixes

  • adds git user check before running renku init (#892) (2e52dff)

  • adds sorting to file listing (#960) (bcf6bcd)

  • avoid empty commits when adding files (#842) (8533a7a)

  • Fixes dataset naming (#898) (418deb3)

  • Deletes temporary branch after renku init –force (#887) (eac0463)

  • enforces label on SoftwareAgent (#869) (71badda)

  • Fixes JSON-LD translation and related issues (#846) (65e5469)

  • Fixes renku run error message handling (#961) (81d31ff)

  • Fixes renku update workflow failure handling and renku status error handling (#888) (3879124)

  • Fixes sameAs property to follow schema.org spec (#944) (291380e)

  • handle missing renku directory (#989) (f938be9)

  • resolves symlinks when pulling LFS (#981) (68bd8f5)

  • serializes all zenodo metadata (#941) (787978a)

  • Fixes various bugs in dataset import (#882) (be28bf5)

Features

0.8.0 (2019-11-21)

Bug Fixes

  • addressed CI problems with git submodules (#783) (0d3eeb7)

  • adds simple check on empty filename (#786) (8cd061b)

  • ensure all Person instances have valid ids (4f80efc), closes #812

  • Fixes jsonld issue when importing from dataverse (#759) (ffe36c6)

  • fixes nested type scoped handling if a class only has a single class (#804) (16d03b6)

  • ignore deleted paths in generated entities (86fedaf), closes #806

  • integration tests (#831) (a4ad7f9)

  • make Creator a subclass of Person (ac9bac3), closes #793

  • Redesign scoped context in jsonld (#750) (2b1948d)

Features

0.7.0 (2019-10-15)

Bug Fixes

  • use UI-resolved project path as project ID (#701) (dfcc9e6)

0.6.1 (2019-10-10)

Bug Fixes

  • add .renku/tmp to default .gitignore (#728) (6212148)

  • dataset import causes renku exception due to duplicate LocalClient (#724) (89411b0)

  • delete new dataset ref if file add fails (#729) (2dea711)

  • fixes bug with deleted files not getting committed (#741) (5de4b6f)

  • force current project for entities (#707) (538ef07)

  • integration tests for #681 (#747) (b08435d)

  • use commit author for project creator (#715) (1a40ebe), closes #713

  • zenodo dataset import error (f1d623a)

Features

0.6.0 (2019-09-18)

Bug Fixes

  • adds _label and commit data to imported dataset files, single commit for imports (#651) (75ce369)

  • always add commit to dataset if possible (#648) (7659bc8), closes #646

  • cleanup needed for integration tests on py35 (#653) (fdd7215)

  • fixed serialization of datetime to iso format (#629) (693d59d)

  • fixes broken integration test (#649) (04eba66)

  • hide image, pull, runner, show, workon and deactivate commands (#672) (a3e9998)

  • integration tests fixed (#685) (f0ea8f0)

  • migration of old datasets (#639) (4d4d7d2)

  • migration timezones (#683) (58c2de4)

  • Removes unneccesary call to git lfs with no paths (#658) (e32d48b)

  • renku home directory overwrite in tests (#657) (90e1c48)

  • upload metadata before actual files (#652) (95ed468)

  • use latest_html for version check (#647) (c6b0309), closes #641

  • user-related metadata (#655) (44183e6)

  • zenodo export failing with relative paths (d40967c)

Features

0.5.2 (2019-07-26)

Bug Fixes

  • safe_path check always operates on str (#603) (7c1c34e)

Features

0.5.1 (2019-07-12)

Bug Fixes

  • ensure external storage is handled correctly (#592) (7938ac4)

  • only check local repo for lfs filter (#575) (a64dc79)

  • cli: allow renku run with many inputs (f60783e), closes #552

  • added check for overwriting datasets (#541) (8c697fb)

  • escape whitespaces in notebook name (#584) (0542fcc)

  • modify json-ld for datasets (#534) (ab6a719), closes #525 #526

  • refactored tests and docs to align with updated pydoctstyle (#586) (6f981c8)

  • cli: add check of missing references (9a373da)

  • cli: fail when removing non existing dataset (dd728db)

  • status: fix renku status output when not in root folder (#564) (873270d), closes #551

  • added dependencies for SSL support (#565) (4fa0fed)

  • datasets: strip query string from data filenames (450898b)

  • fixed serialization of creators (#550) (6a9173c)

  • updated docs (#539) (ff9a67c)

  • cli: remove dataset aliases (6206e62)

  • cwl: detect script as input parameter (e23b75a), closes #495

  • deps: updated dependencies (691644d)

Features

0.5.0 (2019-03-28)

Bug Fixes

  • api: make methods lock free (1f63964), closes #486

  • use safe_load for parsing yaml (5383d1e), closes #464

  • datasets: link flag on dataset add (eae30f4)

Features

  • api: list datasets from a commit (04a9fe9)

  • cli: add dataset rm command (a70c7ce)

  • cli: add rm command (cf0f502)

  • cli: configurable format of dataset output (d37abf3)

  • dataset: add existing file from current repo (575686b), closes #99

  • datasets: added ls-files command (ccc4f59)

  • models: reference context for relative paths (5d1e8e7), closes #452

  • add JSON-LD output format for datasets (c755d7b), closes #426

  • generate Makefile with log –format Makefile (1e440ce)

v0.4.0

(released 2019-03-05)

  • Adds renku mv command which updates dataset metadata, .gitattributes and symlinks.

  • Pulls LFS objects from submodules correctly.

  • Adds listing of datasets.

  • Adds reduced dot format for renku log.

  • Adds doctor command to check missing files in datasets.

  • Moves dataset metadata to .renku/datasets and adds migrate datasets command and uses UUID for metadata path.

  • Gets git attrs for files to prevent duplicates in .gitattributes.

  • Fixes renku show outputs for directories.

  • Runs Git LFS checkout in a worktrees and lazily pulls necessary LFS files before running commands.

  • Asks user before overriding an existing file using renku init or renku runner template.

  • Fixes renku init --force in an empty dir.

  • Renames CommitMixin._location to _project.

  • Addresses issue with commits editing multiple CWL files.

  • Exports merge commits for full lineage.

  • Exports path and parent directories.

  • Adds an automatic check for the latest version.

  • Simplifies issue submission from traceback to GitHub or Sentry. Requires SENTRY_DSN variable to be set and sentry-sdk package to be installed before sending any data.

  • Removes outputs before run.

  • Allows update of directories.

  • Improves readability of the status message.

  • Checks ignored path when added to a dataset.

  • Adds API method for finding ignored paths.

  • Uses branches for init --force.

  • Fixes CVE-2017-18342.

  • Fixes regex for parsing Git remote URLs.

  • Handles --isolation option using git worktree.

  • Renames client.git to client.repo.

  • Supports python -m renku.

  • Allows ‘.’ and ‘-’ in repo path.

v0.3.3

(released 2018-12-07)

  • Fixes generated Homebrew formula.

  • Renames renku pull path to renku storage pull with deprecation warning.

v0.3.2

(released 2018-11-29)

  • Fixes display of workflows in renku log.

v0.3.1

(released 2018-11-29)

  • Fixes issues with parsing remote Git URLs.

v0.3.0

(released 2018-11-26)

  • Adds JSON-LD context to objects extracted from the Git repository (see renku show context --list).

  • Uses PROV-O and WFPROV as provenance vocabularies and generates “stable” object identifiers (@id) for RDF and JSON-LD output formats.

  • Refactors the log output to allow linking files and directories.

  • Adds support for aliasing tools and workflows.

  • Adds option to install shell completion (renku --install-completion).

  • Fixes initialization of Git submodules.

  • Uses relative submodule paths when appropriate.

  • Simplifies external storage configuration.

v0.2.0

(released 2018-09-25)

  • Refactored version using Git and Common Workflow Language.

v0.1.0

(released 2017-09-06)

  • Initial public release as Renga.

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

renku-0.10.5.dev66.tar.gz (4.7 MB view hashes)

Uploaded Source

Built Distribution

renku-0.10.5.dev66-py2.py3-none-any.whl (4.8 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page