Converts and organises raw MRI data-sets according to the Brain Imaging Data Standard (BIDS)
Project description
BIDScoin
- The BIDScoin workflow
- The BIDScoin tools
- The bidsmap files
- BIDScoin functionality / TODO
- BIDScoin tutorial
BIDScoin is a python commandline toolkit that converts ("coins") source-level (raw) MRI data-sets to nifti / json / tsv data-sets that are organized according to the Brain Imaging Data Standard, a.k.a. BIDS. Rather then depending on complex or ambiguous logic, BIDScoin uses a simple (but powerful) key-value approach to convert the raw source data into BIDS data. The key values that can be used in BIDScoin to map the data are:
- Information in the MRI header files (DICOM, PAR/REC or .7 format; e.g. SeriesDescription)
- Information from nifti headers (e.g. image dimensionality)
- Information in the file structure (file- and/or directory names, e.g. number of files)
The key-value heuristics are stored in flexible, human readable and broadly supported YAML files. The nifti- and json-files are generated with dcm2niix. For more information on the installation and requirements, see the installation guide.
Currently, BIDScoin is quite functional, but note that only option (1) has been implemented for DICOM files. (Options (2) and (3) are planned for future versions, such that (3) takes precedence over (2), which in turn takes precedence over (1)).
BIDScoin is a user friendly toolkit that requires no programming knowledge in order to use it, just some basic file handling and, possibly, minor (YAML) text editing skills.
The BIDScoin workflow
BIDScoin will take your raw data as well as a YAML file with the key-value mapping information as input, and returns a BIDS folder as output. Here is how to prepare the BIDScoin inputs:
-
A minimally organised raw data folder, following a
/raw/sub-[identifier]/ses-[identifier]/[seriesfolder]/[dicomfile]
structure. This data organization is how users receive their data from the (Siemens) scanners at the DCCN (NB: theses-[identifier]
sub-folder is optional and can be left out).If your data is not already organized in this way, you can use the
dicomsort.py
command-line utility to move your unordered DICOM-files into aseriesfolder
organization with the DICOM series-folders being named [SeriesNumber]-[SeriesDescription]. Series folders contain a single data type and are typically acquired in a single run.Another command-line utility that can be helpful in organizing your raw data is
rawmapper.py
. This utility can show you the overview (map) of all the values of DICOM-fields of interest in your data-set and, optionally, use these fields to rename your raw data sub-folders (this can be handy e.g. if you manually entered subject-identifiers as [Additional info] at the scanner console and you want to use these to rename your subject folders).If these utilities do not satisfy your needs, then have a look at this reorganize_dicom_files tool.
-
A YAML file with the key-value mapping information, i.e. a bidsmap. There are two ways to create such a bidsmap.
The first is if you are a new user and are working from scratch. In this case you would start with the
bidstrainer.py
command-line tool (see the BIDScoin workflow diagram and the bidstrainer section).If you have run the bidstrainer or, e.g. if you work in an institute where someone else (i.e. your MR physicist ;-)) has already performed the training procedure, you can use the training data to map all the files in your data-set with the
bidsmapper.py
command-line tool (see the bidsmapper section).The output of the bidsmapper is the complete bidsmap that you can inspect to see if your raw data will be correctly mapped onto BIDS. If this is not the case you can go back to the training procedure and change or add new samples, and rerun the bidstrainer and bidsmapper until you have a suitable bidsmap. Alternatively, or in addition to, you can directly edit the bidsmap yourself (this requires more expert knowledge but can also be more powerful).
BIDScoin workflow. Left: New users would start with the bidstrainer, which output can be fed into the bidsmapper to produce the bidsmap.yaml file. This file can (and should) be inspected and, in case of incorrect mappings, inform the user to add raw training samples and re-run the training procedure (dashed arrowlines). Right: Institute users could start with an institute provided bidsmap file (e.g. bidsmap_dccn.yaml) and directly use the bidsmapper. In case of incorrect mappings they could ask the institute for an updated bidsmap (dashed arrowline).
Having an organized raw data folder and a correct bidsmap, the actual data-set conversion to BIDS can now be performed fully automatically by simply running the bidscoiner.py
command-line tool (see the BIDScoin workflow diagram and the bidscoiner section).
The BIDScoin tools
Running the bidstrainer
usage: bidstrainer.py [-h] bidsfolder [samplefolder] [bidsmap]
Takes example files from the samples folder as training data and creates a key-value
mapping, i.e. a bidsmap_sample.yaml file, by associating the file attributes with the
file's BIDS-semantic pathname
positional arguments:
bidsfolder The destination folder with the bids data structure
samplefolder The root folder of the directory tree containing the sample
files / training data. Optional argument, if left empty,
bidsfolder/code/samples is used or such an empty directory
tree is created
bidsmap The bidsmap YAML-file with the BIDS heuristics (optional
argument, default: ./heuristics/bidsmap_template.yaml)
optional arguments:
-h, --help show this help message and exit
examples:
bidstrainer.py /project/foo/bids
bidstrainer.py /project/foo/bids /project/foo/samples bidsmap_custom
The core idea of the bidstrainer is that you know your own scan protocol and can therefore point out which files should go where in the BIDS. In order to do so, you have to place raw sample files for each of the BIDS data types / runs in your scan protocol (e.g. T1, fMRI, etc) in the appropriate folder of a semantic folder tree (named samples
, see the bidstrainer example). If you run bidstrainer.py
with just the name of your bidsfolder, bidstrainer will create this semantic folder tree for you in the code
subfolder (if it is not already there). Generally, when placing your sample files, it will be fairly straightforward to find your way in this semantic folder tree, but in doubt you should have a look at the BIDS specification. Note that the deepest foldername in the tree denotes the BIDS suffix (e.g. "T1w"). You do not need to place samples from your non-BIDS data types / runs (such as localizer or spectroscopy scans) in the folder tree, these data types will automatically go into the "extra_data" folder.
If all sample files have been put in the appropriate location, you can (re)run the bidstrainer to create a bidsmap file for your study. How this works is that, on one hand, the bidstrainer will read a predefined set of (e.g. key DICOM) attributes from each sample file and, on the other hand, take the path-names of the sample files to infer the associated BIDS modality. In this way, a list of unique key-value mappings between sets of (DICOM) attributes and sets of BIDS-labels is defined, the so-called bidsmap, that can be used as input for the bidsmapper tool. If the predifend set of attributes does not uniquely identify your particular scan sequences (not likely but possible), or if you simnply prefer to use more or other attributes, you can (copy and) edit the bidsmap_template.yaml file in the heuristics folder and re-run the bidstrainer whith this customized template as an input argument.
Bidstrainer example. The red arrow depicts a raw data sample (left file browser) that is put (copied over) to the appropriate location in the semantic folder tree (right file browser)
Running the bidsmapper
usage: bidsmapper.py [-h] [-a] rawfolder bidsfolder [bidsmap]
Creates a bidsmap.yaml YAML file that maps the information from all raw data to the
BIDS labels (see also [bidsmap_template.yaml] and [bidstrainer.py]). You can check
and edit the bidsmap.yaml file before passing it to [bidscoiner.py]
positional arguments:
rawfolder The source folder containing the raw data in
sub-#/ses-#/series format
bidsfolder The destination folder with the bids data structure
bidsmap The bidsmap YAML-file with the BIDS heuristics (optional
argument, default: bidsfolder/code/bidsmap_sample.yaml)
optional arguments:
-h, --help show this help message and exit
-a, --automatic If this flag is given the user will not be asked for help
if an unknown series is encountered
examples:
bidsmapper.py /project/foo/raw /project/foo/bids
bidsmapper.py /project/foo/raw /project/foo/bids bidsmap_dccn
The bidsmapper.py
tool goes over all raw data folders of your dataset and saves the known and unknown key-value mappings in a (study specific) bidsmap file. You can consider it as a dry-run for how exactly the bidscoiner will convert the raw data into BIDS folders. It gives you the opportunity to inspect the resulting bidsmap.yaml
file to see if all data types / runs were recognized correctly with proper BIDS labels before doing the actual conversion to BIDS. Unexpected mappings or poor BIDS labels can be found if your bidstraining or the bidsmap file that was provided to you was incomplete. In that case you should either get an updated bidsmap file or redo the bidstraining with new sample files, rerun the bidstrainer and bidsmapper until you have a suitable bidsmap.yaml
file. You can of course also directly edit the bidsmap.yaml
file yourself, for instance by changing some of the automatically generated BIDS labels to your needs (e.g. "task_label").
Running the bidscoiner
usage: bidscoiner.py [-h] [-s [SUBJECTS [SUBJECTS ...]]] [-f] [-p]
[-b BIDSMAP]
rawfolder bidsfolder
Converts ("coins") datasets in the rawfolder to nifti / json / tsv datasets in the
bidsfolder according to the BIDS standard. Check and edit the bidsmap.yaml file to
your needs before running this function. Provenance, warnings and error messages are
stored in the ../bidsfolder/code/bidscoiner.log file
positional arguments:
rawfolder The source folder containing the raw data in
sub-#/ses-#/series format
bidsfolder The destination folder with the bids data structure
optional arguments:
-h, --help show this help message and exit
-s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]]
Space seperated list of selected sub-# names / folders
to be processed. Otherwise all subjects in the
rawfolder will be selected
-f, --force If this flag is given subjects will be processed,
regardless of existing folders in the bidsfolder.
Otherwise existing folders will be skipped
-p, --participants If this flag is given those subjects that are in
particpants.tsv will not be processed (also when the
--force flag is given). Otherwise the participants.tsv
table is ignored
-b BIDSMAP, --bidsmap BIDSMAP
The bidsmap YAML-file with the study heuristics. If
the bidsmapfile is relative (i.e. no "/" in the name)
then it is assumed to be located in bidsfolder/code/.
Default: bidsmap.yaml
examples:
bidscoiner.py /project/raw /project/bids
bidscoiner.py -f /project/raw /project/bids -s sub-009 sub-030
The bidscoiner.py
tool is the workhorse of the toolkit that will fully automatically convert your source-level (raw) MRI data-sets to BIDS organized data-sets. In order to do so, it needs a bidsmap file, which is typically created by running the bidsmapper tool. You can run bidscoiner.py
after all data is collected, or whenever new data has been added to the raw folder (presuming the scan protocol hasn't changed).
After a successful run of bidscoiner.py
, the work to convert your data in a fully compliant BIDS dataset is unfortunately not yet fully over and, depending on the complexity of your data-set, additional tools may need to be run and meta-data may need to be entered manually (not everything can be automated). For instance, you should update the content of the dataset_description.json
and README
files in your bids folder and you may need to provide e.g. additional *_scans.tsv
,*_sessions.tsv
or participants.json
files (see the BIDS specification for more information). Moreover, if you have behavioural log-files you will find that BIDScoin does not (yet) support converting these into BIDS compliant *_events.tsv/json
files (advanced users are encouraged to use the bidscoiner.py
plug-in possibility and write their own log-file parser).
If all of the above work is done, you can (and should) run the web-based bidsvalidator to check for inconsistencies or missing files in your bids data-set (NB: the bidsvalidator also exists as a command-line tool).
NB: The provenance of the produced BIDS data-sets is stored in the bids/code/bidscoiner.log
file. This file is also very useful for debugging / tracking down bidsmapping issues.
The bidsmap files
A bidsmap file contains a collection of key-value dictionaries that define unique mappings between different types of raw data files (e.g. DICOM series) and their corresponding BIDS labels. As bidsmap files are both inputs as well as outputs for the different BIDScoin tools (except for bidscoiner.py
, which has BIDS data as output; see the BIDScoin workflow), they are derivatives of eachother and, as such, share the same basic structure. The bidsmap_template.yaml file is relatively empty and defines only which attributes (but not their values) are mapped to which BIDS-labels. The bidsmap_[sample/site].yaml file contains actual attribute values (e.g. from training samples from a certain study or site) and their associated BIDS-values. The final bidsmap.yaml file contains the attribute and associated BIDS values for all types of data found in entire raw data collection.
A bidsmap file consists of help-text, followed by several mapping sections, i.e. Options, DICOM, PAR, P7, Nifti, FileSystem and Plugin. Within each of these sections there different sub-sections for the different BIDS modalities, i.e. for anat, func, dwi, fmap and beh. There are a few additional sections, i.e. participant_label, session_label and extra_data. Schematically, a bidsmap file has the following structure:
- Options (A list of general options that can be passed to the bidscoiner and its plug-ins)
- DICOM
- participant_label [a DICOM field]
- session_label [a DICOM field]
- anat
- attributes
- [a DICOM field]
- [another DICOM field]
- [..]
- acq_label
- rec_label
- run_index
- mod_label
- modality_label
- ce_label
- attributes
- func
- attributes
- [a DICOM field]
- [another DICOM field]
- [..]
- task_label
- acq_label
- [..]
- attributes
- dwi
- [..]
- fmap
- [..]
- beh
- [..]
- extra_data (all non-BIDS data)
- [..]
- PAR.
- P7.
- Nifti.
- FileSystem.
- PlugIn. Name of the python plug-in function. Supported but this is an experimental (untested) feature
Inside each BIDS modality, there can be multiple key-value mappings that map (e.g. DICOM) modality [attributes] to the BIDS [labels] (e.g. "task_label"), as indicated below:
Bidsmap_sample example. As indicated by the solid arrowline, the set of DICOM values (suitable to uniquely identify the DICOM series) are used here a key-set that maps onto the set of BIDS labels. Note that certain BIDS labels are enclosed by pointy brackets, marking their dynamic value. In this bidsmap, as indicated by the dashed arrowline, that means that <ProtocolName> will be replaced in a later stage by "t1_mprage_sag_p2_iso_1.0". Also note that in this bidsmap there was only one T1-image, but there where two different fMRI runs (here because of multi-echo, but multiple tasks could also be listed)
Tips and tricks
Attribute list
The attribute value can also be a list, in which case a (DICOM) series is positively identified if its attribute value is in this list. If the attribute value is empty it is not used to identify the series
Dynamic values
The BIDS labels can be static, in which case the value is just a normal string, or dynamic, when the string is enclosed with pointy brackets like <attribute name> or <<argument1><argument2>> (see the example above). In case of single pointy brackets the value will be replaced during bidsmapper and bidscoiner runtime by the value of the attribute with that name. In case of double pointy brackets, the value will be updated for each subject/session during bidscoiner runtime (e.g. the <<runindex>> value will be increased if a file with the same runindex already exists in that directory).
Field maps: IntendedFor
You can use the "IntendedFor" field to indicate for which runs (DICOM series) a fieldmap was intended. The dynamic value of the "IntendedFor" field can be a list of string patterns that is used to include those runs that have that string pattern in their nifti pathname (e.g. <<task>> to include all functional runs or <<Stop*Go><Reward>> to include "Stop1Go"-, "Stop2Go"- and "Reward"-runs).
Plug-in functions
WIP
BIDScoin functionality / TODO
- DICOM source data
- PAR / REC source data
- P7 source data
- Nifti source data
- Fieldmaps
- Multi-echo data
- Multi-coil data
- Stimulus / behavioural logfiles
Are you a python programmer with an interest in BIDS who knows all about GE and / or Philips data? Are you experienced with parsing stimulus presentation log-files? Or do you have ideas to improve the this toolkit or its documentation? Have you come across bugs? Then you are highly encouraged to provide feedback or contribute to this project on https://github.com/Donders-Institute/bidscoin.
BIDScoin tutorial
This tutorial is specific for researchers from the DCCN and makes use of data-sets stored on its central file-system. However, it should not be difficult to use (at least part of) this tutorial for other data-sets as well.
-
Preparation. Activate the bidscoin environment and create a tutorial playground folder in your home directory by executing these bash commands:
module add bidscoin/1.4 source activate /opt/bidscoin cp -r /opt/bidscoin/tutorial ~
The new
tutorial
folder contains araw
source-data folder and abids_ref
reference BIDS folder, i.e. the end product of this tutorial.Let's begin with inspecting this new raw data collection:
- Are the DICOM files for all the sub-*/ses-# folders organised in series-subfolders (e.g. sub-001/ses-01/003-T1MPRAGE/0001.dcm etc)? Use
dicomsort.py
if not - Use the
rawmapper.py
command to print out the DICOM values of the "EchoTime", "Sex" and "AcquisitionDate" of the fMRI series in theraw
folder
- Are the DICOM files for all the sub-*/ses-# folders organised in series-subfolders (e.g. sub-001/ses-01/003-T1MPRAGE/0001.dcm etc)? Use
-
BIDS training. Now that we have some source data and have inspected its properties, we are ready to start with the actual BIDS coining process. The first step is to perform training on a few raw data samples:
- Put files (training data) in the right subfolders in this
samples
tree - Create a
bids\code\samples
foldertree in yourtutorial
folder with this bash command:
cd ~/tutorial bidstrainer.py bids
- Create a
bids/code/bidsmap_sample.yaml
bidsmap file by re-running the abovebidstrainer.py bids
command - Inspect the newly created bidsmap file. Can you recognise the key-value mappings? Which fields are going to end up in the filenames of the final BIDS datasets?
- Put files (training data) in the right subfolders in this
-
BIDS mapping. Scan all folders in the raw data collection for unknown data by running the bidsmapper bash command:
bidsmapper.py raw bids
- Open the
bids/code/bidsmap.yaml
file and check the "extra_data" section for images that should go in the BIDS sections (e.g. T1, fMRI or DWI data). If so, add the missing training samples (check the messages in the command shell) to thesamples
folder tree and rerun thebidstrainer.py bids
command. - In the
bids/code/bidsmap.yaml
file, rename the "task_label" of the functional scans into something more readable, e.g. "Reward" and "Stop" - Add a search pattern to the IntendedFor field such that it will select your fMRI runs
- Change the options such that you will get non-zipped nifti data (i.e.
*.nii
instead of*.nii.gz
) in your BIDS data collection
- Open the
-
BIDS coining. Convert your raw data collection into a BIDS collection by running the bidscoiner bash command (note that the input is the same as for the bidsmapper):
bidscoiner.py raw bids
- Check your
bids/code/bidscoiner.log
file for any errors or warnings - Compare the results in your
bids/sub-#
subject folders with the inbids_ref
reference result. Are the file and foldernames the same? Also check the json sidecar files of the fieldmaps. Do they have the right "EchoTime" and "IntendedFor" fields? - What happens if you re-run the
bidscoiner.py
command? Are the same subjects processed again? Re-run "sub-001". - Inspect the
bids/participants.tsv
file and decide if it is ok. - Update the
dataset_description.json
andREADME
files in yourbids
folder - As a final step, run the bids-validator on your
~/bids_tutorial
folder. Are you completely ready now to share this dataset?
- Check your
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bidscoin-1.4.tar.gz
.
File metadata
- Download URL: bidscoin-1.4.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2a900dced96dcb966034e05c1ef53cb935a38f492a5af2463890269f5748faf |
|
MD5 | b8348f65bbfc0e59b321645ca2aeb95d |
|
BLAKE2b-256 | 3ec1614c64318d15eaca67a31a3cf373040bae3b24946c5b17b00e6f8b1382e3 |
File details
Details for the file bidscoin-1.4-py3-none-any.whl
.
File metadata
- Download URL: bidscoin-1.4-py3-none-any.whl
- Upload date:
- Size: 75.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40e6590d95923ed9b891b6f91bc8aec4cda27001437e153f0e8e5b2bf6f431f2 |
|
MD5 | 84f41bacdcc85eecfdd02a1356beca96 |
|
BLAKE2b-256 | d55cbf23e8ed70442610131d52761b0bdb5ce9a2fd8689e3a3783c703e60d289 |