Skip to main content

What is next in DataLad

Project description

DataLad NEXT extension

All Contributors Build status codecov docs Documentation Status License: MIT GitHub release PyPI version fury.io DOI

This DataLad extension can be thought of as a staging area for additional functionality, or for improved performance and user experience. Unlike other topical or more experimental extensions, the focus here is on functionality with broad applicability. This extension is a suitable dependency for other software packages that intend to build on this improved set of functionality.

Installation

# create and enter a new virtual environment (optional)
$ virtualenv --python=python3 ~/env/dl-next
$ . ~/env/dl-next/bin/activate
# install from PyPi
$ python -m pip install datalad-next

How to use

Additional commands provided by this extension are immediately available after installation. However, in order to fully benefit from all improvements, the extension has to be enabled for auto-loading by executing:

git config --global --add datalad.extensions.load next

Doing so will enable the extension to also alter the behavior the core DataLad package and its commands.

Summary of functionality provided by this extension

  • A replacement sub-system for credential handling that is able to handle arbitrary properties for annotating a secret, and facilitates determining suitable credentials while minimizing avoidable user interaction, without compromising configurability. A convenience method is provided that implements a standard workflow for obtaining a credential.
  • A user-facing credentials command to set, remove, and query credentials.
  • The create-sibling-... commands for the platforms GitHub, GIN, GOGS, Gitea are equipped with improved credential handling that, for example, only stores entered credentials after they were confirmed to work, or auto-selects the most recently used, matching credentials, when none are specified.
  • A create-sibling-webdav command for hosting datasets on a WebDAV server via a sibling tandem for Git history and file storage. Datasets hosted on WebDAV in this fashion are cloneable with datalad-clone. A full annex setup for storing complete datasets with historical file content version, and an additional mode for depositing single-version dataset snapshot are supported. The latter enables convenient collaboration with audiences that are not using DataLad, because all files are browsable via a WebDAV server's point-and-click user interface.
  • Enhance datalad-push to automatically export files to git-annex special remotes configured with exporttree=yes.
  • Speed-up datalad-push when processing non-git special remotes. This particularly benefits less efficient hosting scenarios like WebDAV.
  • Enhance datalad-siblings enable (AnnexRepo.enable_remote()) to automatically deploy credentials for git-annex special remotes that require them.
  • git-remote-datalad-annex is a Git remote helper to push/fetch to any location accessible by any git-annex special remote.
  • git-annex-backend-XDLRA (originally available from the mihextras extension) is a custom external git-annex backend used by git-remote-datalad-annex. A base class to facilitate development of external backends in Python is also provided.
  • Enhance datalad-configuration to support getting configuration from "global" scope without a dataset being present.
  • New modular framework for URL operations. This framework directly supports operation on http(s), ssh, and file URLs, and can be extended with custom functionality for additional protocols or even interaction with specific individual servers. The basic operations download, upload, delete, and stat are recognized, and can be implemented. The framework offers uniform progress reporting and simultaneous content has computation. This framework is meant to replace and extend the downloader/provide framework in the DataLad core package. In contrast to its predecessor it is integrated with the new credential framework, and operations beyond downloading.
  • git-annex-remote-uncurl is a special remote that exposes the new URL operations framework via git-annex. It provides flexible means to compose and rewrite URLs (e.g., to compensate for storage infrastructure changes) without having to modify individual URLs recorded in datasets. It enables seamless transitions between any services and protocols supported by the framework. This special remote can replace the datalad special remote provided by the DataLad core package.
  • A download command is provided as a front-end for the new modular URL operations framework.
  • A python-requests compatible authentication handler (DataladAuth) that interfaces DataLad's credential system.
  • Boosted throughput of DataLad's runner component for command execution.
  • Substantially more comprehensive replacement for DataLad's constraints system for type conversion and parameter validation.
  • Windows and Mac client support for RIA store access.
  • A next-status command that is A LOT faster than status, and offers a mono recursion mode that shows modifications of nested dataset hierarchies relative to the state of the root dataset. Requires Git v2.31 (or later).

Summary of additional features for DataLad extension development

  • Framework for uniform command parameter validation. Regardless of the used API (Python, CLI, or GUI), command parameters are uniformly validated. This facilitates a stricter separation of parameter specification (and validation) from the actual implementation of a command. The latter can now focus on a command's logic only, while the former enables more uniform and more comprehensive validation and error reporting. Beyond per-parameter validation and type-conversion also inter-parameter dependency validation and value transformations are supported.
  • Improved composition of importable functionality. Key components for commands, annexremotes, datasets (etc) are collected in topical top-level modules that provide "all" necessary pieces in a single place.
  • webdav_server fixture that automatically deploys a local WebDAV server.
  • Utilities for HTTP handling
    • probe_url() discovers redirects and authentication requirements for an HTTP URL
    • get_auth_realm() returns a label for an authentication realm that can be used to query for matching credentials
  • Utilities for special remote credential management:
    • get_specialremote_credential_properties() inspects a special remote and returns properties for querying a credential store for matching credentials
    • update_specialremote_credential() updates a credential in a store after successful use
    • get_specialremote_credential_envpatch() returns a suitable environment "patch" from a credential for a particular special remote type
  • Helper for runtime-patching other datalad code (datalad_next.utils.patch)
  • Base class for implementing custom git-annex backends.
  • A set of pytest fixtures to:
    • check that no global configuration side-effects are left behind by a test
    • check that no secrets are left behind by a test
    • provide a temporary configuration that is isolated from a user environment and from other tests
    • provide a temporary secret store that is isolated from a user environment and from other tests
    • provide a temporary credential manager to perform credential deployment and manipulation isolated from a user environment and from other tests
  • An iter_subproc() helper that enable communication with subprocesses via input/output iterables.
  • A shell context manager that enables interaction with (remote) shells, including support for input/output iterables for each shell-command execution within the context.

Patching the DataLad core package.

Some of the features described above rely on a modification of the DataLad core package itself, rather than coming in the form of additional commands. Loading this extension causes a range of patches to be applied to the datalad package to enable them. A comprehensive description of the current set of patch is available at http://docs.datalad.org/projects/next/en/latest/#datalad-patches

Developing with DataLad NEXT

This extension package moves fast in comparison to the core package. Nevertheless, attention is paid to API stability, adequate semantic versioning, and informative changelogs.

Public vs internal API

Anything that can be imported directly from any of the sub-packages in datalad_next is considered to be part of the public API. Changes to this API determine the versioning, and development is done with the aim to keep this API as stable as possible. This includes signatures and return value behavior.

As an example: from datalad_next.runners import iter_git_subproc imports a part of the public API, but from datalad_next.runners.git import iter_git_subproc does not.

Use of the internal API

Developers can obviously use parts of the non-public API. However, this should only be done with the understanding that these components may change from one release to another, with no guarantee of transition periods, deprecation warnings, etc.

Developers are advised to never reuse any components with names starting with _ (underscore). Their use should be limited to their individual subpackage.

Acknowledgements

This DataLad extension was developed with funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project).

Contributors

Michael Hanke
Michael Hanke

๐Ÿ› ๐Ÿ’ป ๐Ÿ–‹ ๐ŸŽจ ๐Ÿ“– ๐Ÿ’ต ๐Ÿ” ๐Ÿค” ๐Ÿš‡ ๐Ÿšง ๐Ÿง‘โ€๐Ÿซ ๐Ÿ“ฆ ๐Ÿ“† ๐Ÿ‘€ ๐Ÿ“ข โš ๏ธ ๐Ÿ”ง ๐Ÿ““
catetrai
catetrai

๐Ÿ’ป ๐ŸŽจ ๐Ÿ“– ๐Ÿค” โš ๏ธ
Chris Markiewicz
Chris Markiewicz

๐Ÿšง ๐Ÿ’ป
Michaล‚ Szczepanik
Michaล‚ Szczepanik

๐Ÿ› ๐Ÿ’ป ๐Ÿ–‹ ๐Ÿ“– ๐Ÿ’ก ๐Ÿค” ๐Ÿš‡ ๐Ÿšง ๐Ÿ‘€ ๐Ÿ“ข โš ๏ธ โœ… ๐Ÿ““
Stephan Heunis
Stephan Heunis

๐Ÿ› ๐Ÿ’ป ๐Ÿ“– ๐Ÿค” ๐Ÿšง ๐Ÿ“ข ๐Ÿ““
Benjamin Poldrack
Benjamin Poldrack

๐Ÿ› ๐Ÿ’ป
Yaroslav Halchenko
Yaroslav Halchenko

๐Ÿ› ๐Ÿ’ป ๐Ÿš‡ ๐Ÿšง ๐Ÿ”ง
Christian Mรถnch
Christian Mรถnch

๐Ÿ’ป ๐ŸŽจ ๐Ÿ“– ๐Ÿค” ๐Ÿ‘€ โš ๏ธ ๐Ÿ““
Adina Wagner
Adina Wagner

๏ธ๏ธ๏ธ๏ธโ™ฟ๏ธ ๐Ÿ› ๐Ÿ’ป ๐Ÿ“– ๐Ÿ’ก ๐Ÿšง ๐Ÿ“† ๐Ÿ‘€ ๐Ÿ“ข โš ๏ธ โœ… ๐Ÿ““
John T. Wodder II
John T. Wodder II

๐Ÿ’ป ๐Ÿš‡ โš ๏ธ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad_next-1.5.0.tar.gz (470.4 kB view details)

Uploaded Source

Built Distribution

datalad_next-1.5.0-py3-none-any.whl (485.4 kB view details)

Uploaded Python 3

File details

Details for the file datalad_next-1.5.0.tar.gz.

File metadata

  • Download URL: datalad_next-1.5.0.tar.gz
  • Upload date:
  • Size: 470.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for datalad_next-1.5.0.tar.gz
Algorithm Hash digest
SHA256 02f950aff5c03f0e5b91b011691496e2df9346a4bd73234eafad60f7341bfc05
MD5 d28b4cf284bf1cb6b39c32f903429052
BLAKE2b-256 7bc590f6e8671031aca45ed01da41b504f9f6bce08141769d381e0b000a9edb0

See more details on using hashes here.

File details

Details for the file datalad_next-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: datalad_next-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 485.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for datalad_next-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a49168eb9113c476f0ffddd6808faa2d5886efe365bd2b6c1452bc37b49cb395
MD5 cdf34bfc2b4963aaa409661eaea420e6
BLAKE2b-256 eb504b6ef8d68f7b5dda0bb13e6a944bcac978d590960b70b5b166c84c3c9b00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page