Skip to main content

DataLad extension for code execution in get commands

Project description

DataLad extension for code execution in get commands

check codecov docs

CAUTION: Work-in-Progress!

This DataLad extension provides facilities to register arbitrary commands for files in git-annex, which are then executed if datalad get is called on those files (and they are not yet present).

How do I use this?

This extension provides a new high-level datalad command called getexec which can be used to register commands on files.

In the following we will assume that we have the extension installed and are inside a DataLad dataset.

As a simple example, we can register a command that writes "Hello World!" into a text file called "test.txt":

datalad getexec --path test.txt -- 'bash' '-c' 'printf "Hello World!" > "$1"' 'test-cmd'

As a result of this, we now have the file "test.txt" with it's expected content. Since we told git-annex that we can recreate this file with the specified bash call, we can now safely drop the file and then automatically get it recreated:

datalad drop test.txt
datalad get test.txt

Since our registered program might depend on some other annex'ed files we can specify those dependencies as well:

datalad getexec --path depends-on-test.txt -i test.txt -- 'bash' '-c' '(cat test.txt; printf "\nMore Text.") > "$1"' 'test-cmd'

This way, if datalad get is called on "depends-on-test.txt" git-annex will make sure, that "test.txt" is present before executing the registered command. Therefore, the following will work:

datalad drop test.txt
datalad drop depends-on-test.txt
datalad get depends-on-test.txt

There are some limitations to what commands can be registered. First of all, there is no shell interpretation happening; the command is essentially passed verbatim to python's subprocess.run. This is why the examples above look a bit more complex with the call to bash. In the above examples, each quoted part after -- becomes one element in the list passed to subprocess.run. In practice, it would be a good idea to externalize the command into e.g. a shell script and have a single argument in the getexec call.

Second, the command is expected to always produce a single output file, the location of which is passed as the first (and only) argument to the command. This is the $1 in the bash calls above.

Lastly, since the command is executed in the context of a get, the resulting file is always expected to remain the same. This means that two consecutive calls to the command need to produce files with identical checksums, otherwise git-annex will complain. Essentially, the command is expected to behave somewhat like a pure function. If this does not fit your use-case you are probably looking for DataLad's builtin run and rerun.

How does it work?

This extension works by implementing a new git-annex special remote which kind of abuses the URL handling of git-annex. The special remote takes responsibility of all URLs with a scheme of "getexec:"; encoded inside of these URLs then is the necessary information to re-execute the registered command. The DataLad part of the extension simply takes the user input, generates a matching URL for it and then registers the URL with git-annex.

When geting a file that is not currently present, git-annex will do it's usual thing to determine from where to fetch the data. If git-annex determines that the special remote of this extension should provide the data then it will rerun the registered command.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalad-getexec-0.1.0.tar.gz (357.0 kB view details)

Uploaded Source

Built Distribution

datalad_getexec-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file datalad-getexec-0.1.0.tar.gz.

File metadata

  • Download URL: datalad-getexec-0.1.0.tar.gz
  • Upload date:
  • Size: 357.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for datalad-getexec-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d9666de3a08aeb3fc7b0bc6327569aad6150d149098e3f577ea4aa33c2dd6bee
MD5 55ca091856a1a95a6ad4127a606c1fd8
BLAKE2b-256 82ba45e6c0fb413f48e7d04867066eb43038c59a198a2355c8ce24a8accd4be5

See more details on using hashes here.

File details

Details for the file datalad_getexec-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datalad_getexec-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37796d5a0f463b8236fe1fa2790f893c2df6b3c9bc346e56f29a5d112a700475
MD5 d0da401c820a3e198d4b4c7c9a39d904
BLAKE2b-256 6d04814627d389ccdbf40996f714022e6870db1b096373bb30da3126665415d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page