DataLad extension for code execution in get commands
Project description
DataLad extension for code execution in get commands
CAUTION: Work-in-Progress!
This DataLad extension provides facilities to register arbitrary commands for files in git-annex,
which are then executed if datalad get
is called on those files (and they are not yet present).
How do I use this?
This extension provides a new high-level datalad command called getexec
which can be used to register commands on files.
In the following we will assume that we have the extension installed and are inside a DataLad dataset.
As a simple example, we can register a command that writes "Hello World!" into a text file called "test.txt":
datalad getexec --path test.txt -- 'bash' '-c' 'printf "Hello World!" > "$1"' 'test-cmd'
As a result of this, we now have the file "test.txt" with it's expected content. Since we told git-annex that we can recreate this file with the specified bash call, we can now safely drop the file and then automatically get it recreated:
datalad drop test.txt
datalad get test.txt
Since our registered program might depend on some other annex'ed files we can specify those dependencies as well:
datalad getexec --path depends-on-test.txt -i test.txt -- 'bash' '-c' '(cat test.txt; printf "\nMore Text.") > "$1"' 'test-cmd'
This way,
if datalad get
is called on "depends-on-test.txt" git-annex will make sure,
that "test.txt" is present before executing the registered command.
Therefore,
the following will work:
datalad drop test.txt
datalad drop depends-on-test.txt
datalad get depends-on-test.txt
There are some limitations to what commands can be registered.
First of all,
there is no shell interpretation happening;
the command is essentially passed verbatim to python's subprocess.run
.
This is why the examples above look a bit more complex with the call to bash.
In the above examples,
each quoted part after --
becomes one element in the list passed to subprocess.run
.
In practice,
it would be a good idea to externalize the command into e.g. a shell script
and have a single argument in the getexec
call.
Second,
the command is expected to always produce a single output file,
the location of which is passed as the first (and only) argument to the command.
This is the $1
in the bash calls above.
Lastly,
since the command is executed in the context of a get
,
the resulting file is always expected to remain the same.
This means that two consecutive calls to the command need to produce files with identical checksums,
otherwise git-annex will complain.
Essentially,
the command is expected to behave somewhat like a pure function.
If this does not fit your use-case you are probably looking for DataLad's builtin run
and rerun
.
How does it work?
This extension works by implementing a new git-annex special remote which kind of abuses the URL handling of git-annex. The special remote takes responsibility of all URLs with a scheme of "getexec:"; encoded inside of these URLs then is the necessary information to re-execute the registered command. The DataLad part of the extension simply takes the user input, generates a matching URL for it and then registers the URL with git-annex.
When get
ing a file that is not currently present,
git-annex will do it's usual thing to determine from where to fetch the data.
If git-annex determines that the special remote of this extension should provide the data
then it will rerun the registered command.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datalad-getexec-0.1.0.tar.gz
.
File metadata
- Download URL: datalad-getexec-0.1.0.tar.gz
- Upload date:
- Size: 357.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9666de3a08aeb3fc7b0bc6327569aad6150d149098e3f577ea4aa33c2dd6bee |
|
MD5 | 55ca091856a1a95a6ad4127a606c1fd8 |
|
BLAKE2b-256 | 82ba45e6c0fb413f48e7d04867066eb43038c59a198a2355c8ce24a8accd4be5 |
File details
Details for the file datalad_getexec-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: datalad_getexec-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37796d5a0f463b8236fe1fa2790f893c2df6b3c9bc346e56f29a5d112a700475 |
|
MD5 | d0da401c820a3e198d4b4c7c9a39d904 |
|
BLAKE2b-256 | 6d04814627d389ccdbf40996f714022e6870db1b096373bb30da3126665415d8 |