No project description provided
Project description
bidsarray
Compile a tabular output of input and output files from a bids dataset. Output can be fed into gnu parallel to run arbitrary commands on an entire bids dataset with parallelization!
Command syntax
Prelude
The first part of a bidsarray
call consists of prelude arguments that initialize the input and output dirs:
bidsarray INPUT_DIR OUTPUT_DIR [--derivatives] \
[--participant-label LABEL ...] [--exclude-participant-label LABEL ...] \
[--pybidsdb-dir DIR] [--pybidsdb-reset]
--derivatives
enables indexing of derivative datasets (in the derivatives/
folder). --participant-label
allows the specification of one or more subject labels (the bit after sub: sub-LABEL
). Only files from these subjects will be produced. --exclude-participant-label
does the same but excludes subjects. --pybidsdb-dir
can be used to specify a pybids database created via the database_path
argument in bids.BIDSLayout
. This can speed up indexing for large datasets. --pybidsdb-reset
forces reindexing of the database.
Components
Components are specified after the prelude, each seperated by :::
(just like in gnu parallel!). So a complete command looks like this:
bidsarray <PRELUDE...> ::: <COMPONENT 1> ::: <COMPONENT 2> ::: ...
Each component may be specified as an input or an output. Inputs are read from an existing dataset, outputs are created based on the provided inputs.
Inputs
Input components are specified as follows:
::: --input [label] [--groupby ENTITY ...] [--aggregate ENTITY ...] [--filter ENTITY[:METHOD]=VALUE ...]
The label
is optional: if provided, it will add a header row to the top of the tabular output. If a label is provided to any one component, all components must receive a label!
--filter
narrows the set of selected paths to those containing the selected entity-values. For example, to select diffusion images, one might use:
::: --input --filter suffix=dwi datatype=dwi extension=.nii.gz
METHOD
tells the filter how to do the selection. By default, it looks for an exact match, but regex can also be used using :match
and :search
:
::: --input --filter 'suffix:match=[Tt][12]w?'
Overall, --filter
tends to REDUCE the number of rows in the output.
--groupby
and --aggregate
both mark variable parts of the path. Often, they'll be entities such as subject
, session
, and run
. These variable entities are referred to generically as wildcards.
--groupby
will create a seperate row per path matched by the filters and wildcards. This is used if you want to run a tool seperately on each file. For instance, if you have a group of images, and you want to apply a smoothing function to each one seperately, you may use:
::: --input --filter suffix=T1w extension=.nii.gz --groupby subject session
--aggregate
joins multiple paths into the same row. This is useful if you want to perform an aggregation, such as an average or standard deviation. For example, if you want to get an average fractional anisotropy map from all subjects, you may use:
::: --input --filter desc=FA suffix=mdp extension=.nii.gz --aggregate subject session
These two flags can be combined. So to get an average FA map for each subject across all sessions, you could use:
::: --input --filter desc=FA suffix=mdp extension=.nii.gz --groupby subject --aggregate session
Outputs
Outputs are generated using the bids function from snakebids
. The syntax is:
::: --output [label] --entities [ENTITY=VALUE ...]
Each ENTITY_VALUE
pair provided to --entities
specifies a static value that should be applied to all wildcards. For instance, continuing our smoothing example, you may use:
::: --input --filter suffix=T1w extension=.nii.gz --groupby subject session ::: --output --entities datatype=anat suffix=T1w desc=smoothed
Importantly, any wildcards specified using --groupby
are AUTOMATICALLY provided to each output. So in the above example, our table will include a row for each subject-session combination, with a correctly formatted output path.
Feeding outputs to parallel
Basics
While the tabular output could be used for any number of purposes, it really shines in combination with gnu parallel. Parallel is a powerful and complicated tool; its full usage can be found on its documentation. Here we show just the basics using it with bidsarray.
bidsarray <PRELUDE...> ::: <COMPONENT 1> ::: <COMPONENT 2> | parallel --colsep '\t' echo 1={1} 2={2}
The --colsep
argument to parallel
allows it to read from an incoming columnar data. We use \t
as the argument to read bidsarray
tabular data.
Examples
In this example, we apply a transform to all of the fractional anisotropy maps of a preprocessed diffusion dataset:
bidsarray . derivatives/template --derivatives \
::: --input --filter suffix=mdp desc=FA extension=.nii.gz space=participant --groupby subject session \
::: --input --filter suffix=xfm from=participant to=MNI6 extension=.nii.gz --groupby subject session \
::: --input --filter suffix=T1w extension=.nii.gz space=MNI6 \
::: --output --entities desc=FA datatype=dwi space=MNI6 suffix=mdp.nii.gz |
parallel --bar --colsep '\t' antsApplyTransform -d3 -i {1} -o {4} -r {3} -t {2}
Using labels
The above examples use numeric ids for argument substitution, which may be difficult when handling many components. It's possible to use labels instead:
bidsarray . derivatives/template --derivatives \
::: --input image --filter suffix=mdp desc=FA extension=.nii.gz space=participant --groupby subject session \
::: --input transform --filter suffix=xfm from=participant to=MNI6 extension=.nii.gz --groupby subject session \
::: --input reference --filter suffix=T1w extension=.nii.gz space=MNI6 \
::: --output out --entities desc=FA datatype=dwi space=MNI6 suffix=mdp.nii.gz |
parallel --bar --colsep '\t' --header : antsApplyTransform -d3 -i {image} -o {out} -r {reference} -t {transform}
Note that the use of labels in parallel
is not compatible with all features.
Aggregation commands with parallel
Parallel automatically shell escapes each column when using --colsep
, including spaces, but bidsarray
uses spaces to seperate files that should be aggregated. So if we try to get the average of set of files, parallel
will read the the filenames as one giant filename. So we need to use a trick to get this to work. There are two basic approaches (both complements of this SO Q/A):
The simplest is to prepend the command with eval
to remove all escapes:
bidsarray . derivatives/average \
::: --input --filter suffix=T1w extension=.nii.gz --groupby subject session --aggregate run \
::: --output --entities suffix=T1w.nii.gz datatype=anat \
parallel --bar --colsep '\t' eval mrcalc {1} -add {2}
This may not work with complex commands, as eval will aggressively strip away quotes. For a more surgical approach, you can use the uq
function:
bidsarray . derivatives/average \
::: --input --filter suffix=T1w extension=.nii.gz --groupby subject session --aggregate run \
::: --output --entities suffix=T1w.nii.gz datatype=anat \
parallel --bar --colsep '\t' mrcalc {=1 uq=} -mean {2}
uq
will only apply to the targeted variable. This approach is not compatible with labels.
Creating output folders
Note that for most commands (especially any involving --groupby
), you will likely need to create the ouput folders as part of the command. Use a command like this:
bidsarray . derivatives/template --derivatives \
::: --input --filter suffix=mdp desc=FA extension=.nii.gz space=participant --groupby subject session \
::: --input --filter suffix=xfm from=participant to=MNI6 extension=.nii.gz --groupby subject session \
::: --input --filter suffix=T1w extension=.nii.gz space=MNI6 \
::: --output --entities desc=FA datatype=dwi space=MNI6 suffix=mdp.nii.gz |
parallel --bar --colsep '\t' mkdir -p {4} \&\& antsApplyTransform -d3 -i {1} -o {4} -r {3} -t {2}
Motivation and Design
bidsarray
was built both to be a useful tool and a demonstration of the bidsapp
module in snakebids
. The parsing, components, and CLI are all provided by snakebids
, so bidsarray
can organize a tabular output with just a few small files.
snakebids.bidsapp
uses a system of hooks and plugins to build an app. In <bidsarray/run.py>, three hooks can be seen:
get_argv
: Retrieve the provided CLI arguments and split them at:::
. This will allow the app to seperately parse the command prelude and each of the components. The prelude arguments are returned to be parsed by the bidsapp, and the components are saved into theconfig
to be parsed later.finalize_config
: The prelude has now been parsed by bidsapp. We retrieve the components arguments we saved earlier and parse them (using functions in <bidsarray/component.py>). The results are saved back intoconfig
run
: We use configured components we calculated earlier and callgenerate_inputs
, which uses pybids to parse the input dataset and create aBidsDataset
. The methods on this dataset can be used to retrieve and organize the indexed paths.
snakebids.bidsapp.app
is used to create the app with several plugins, providing the basic functionality. The last plugin: sys.modules[__name__]
, loads the hooks defined in the run.py
file so that the app will work. Finally, the entrypoint app.run()
is called behind an if __name__ == "__main__"
block. It's also specified in the pyproject.toml
file as a script.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bidsarray-0.1.1.tar.gz
.
File metadata
- Download URL: bidsarray-0.1.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.146.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06deb763935afc8fe993711974dba2c3e1088cbdf6dc2c451bde362e42673e62 |
|
MD5 | 5a13b63b75c721ed359ae6b0ff9aa0e1 |
|
BLAKE2b-256 | 46170056e0962509231da4e6fc862ff7409f172466aee97323aa1fe25dc6ad9e |
File details
Details for the file bidsarray-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: bidsarray-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.146.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fd658c672b710ecbce7b967fe85babaabd78d55d6de7dd04a90e45998991c7e |
|
MD5 | bc8f521ca4c734fb919d230caa6de6f7 |
|
BLAKE2b-256 | 5ffb68055fc544548b0a0d916eac917e94b790396a8a1b0c9f0ae9f55c66cd30 |