Generic File Classifier
Project description
File Classifier
[[TOC]]
Overview
The file classifier gear provides a gear interface to the fw-classification toolkit and is essentially just a wrapper around fw-classification.
For documentation on classification in general, please consult the fw-classification documentation
The file classifier gear uses the file type of the provided input to determine the proper adapter to use.
Supported file types
Currently the gear supports classification of the following file types:
dicom
: via thefile.info.header.dicom
namespace which is populated using the file-metadata-importer gear.nifti
: via ajson
sidecar which is found in the same container as the input.
Usage
Prerequisites
Metadata
In general, since fw-classification acts on input metadata, the input file needs to have
it's metadata populated before running file-classifier. The metadata can live in a few
places depending on how the file will be classified. The most common would be in the
file.info.header.<file-type>
which will be populated by file-metadata-importer
. But
the metadata can also be in a separate file such as the sidecar.json
for NIfTIs, or in
the hierarchy such as acquisition label, file name, or custom information on any parent
container.
Profile
file-classifier ships with default profiles but the gear also accepts an input profile. If you have custom needs beyond what is in the default profile, you will need to override the default profiles. See Custom Classifications
Inputs
- file-input: The file to classify
- profile: Optional profile to use for classification, if passed in, this will override the default classification profile and use what was passed in. See documentation for creating a profile at the classification-toolkit docs
- classifications: An optional list of context classifications set at the project level, see Setting custom classifications. These classifications are added as the final block to the profile that is being used to classify, therefore they get highest priority.
Configuration
- debug (boolean, default False): Include debug statements in output.
- tag (str, default 'file-classifier'): String to tag the file after classification. Useful for gear-rule pipelines triggered by tags.
Which profile will be used?
The priority for determining which profile will be used is as so:
- Profile passed in via the optional input
profile
- Default profile
main.yml
described in the classification-profiles repo.
The profile being used will be printed out at the beginning of the gear.
!!! note After the profile has been determined, context classifications will be added as a block to that profile, i.e. context-classifications always have the highest priority.
Custom Classifications
Often the default profile will not have specific enough classification for a specific project. If you need to add custom classifications, there are two main ways to pass them in:
- Create a profile and attach it to your project
- Add custom classifications to the project custom information.
Create a profile
This is a better option if you will use these same custom classifications on multiple projects.
WARNING:
Creating a profile and passing it in as input will completely bypass the already pre-defined classifications, so if you want to keep those, you will need to either copy them, or include as a git profile:
For example, to add a custom classification of Deleted
when Protocol Name has been
deleted:
---
name: Custom classifier
includes:
# Include default MR
- https://gitlab.com/flywheel-io/scientific-solutions/lib/fw-classification-profiles$profiles/MR.yaml
profile:
- name: set_custom_deleted
description: |
Set custom deleted classification if ProtocolName was deleted
rules:
- match_type: 'all'
match:
- key: file.type
is: dicom
- key: file.info.header.dicom.ProtocolName
is: 'Deleted'
action:
- key: file.classification.Custom
add: 'Deleted'
Add custom classifications to project information
Custom classification can be added to project information. These can be added either
via the SDK or UI, and they follow the same structure as a fw-classification
profile
block.
!!! note
Project information classifications are added _fter_ the profile has been
determined, context classifications will be added as a block to that profile, i.e.
context-classifications always have the highest priority.
For example, adding the same ProtocolName
block via the SDK:
import flywheel
fw = flywheel.Client()
proj = fw.get_project(<proj_id>) # or use lookup()
existing_info = proj.info
# Initialize context classifications if they don't exist
existing_info.setdefault('classifications', [])
existing_info['classifications'].append(
{
'match': [
{
'key': 'file.type',
'is': 'dicom',
},
{
'key': 'file.info.header.dicom.ProtocolName',
'is': 'deleted',
}
],
'action': [
{'key': 'file.classification.Custom', 'add': 'Deleted'},
]
}
)
proj.replace_info(existing_info)
The gear will then record that it found these custom classifications in the job logs:
...
[552ms INFO ] Log level is INFO
[552ms INFO ] Using default profile 'main.yml'
[1152ms INFO ] Looking for custom classifications in project Q1_Q2_2022
[1152ms INFO ] Found custom classification in project context, parsed as:
If all of () executed, then execute the first match of the following:
-------------------- Rule 0 --------------------
Match if Any are True:
- file.type is dicom
- file.info.header.dicom.ProtocolName is deleted
Do the following:
- add Deleted to file.classification.Custom
[1152ms INFO ] Starting classification.
[1380ms INFO ] Running at acquisition level
...
You can also add these values via the UI:
Contributing
For more information about how to get started contributing to that gear, checkout CONTRIBUTING.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for fw_gear_file_classifier-0.6.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58c9bf4e8687c17a1b8bec6c1402caf959ad766c62b7af9a16b14aa2b3065385 |
|
MD5 | c1a7b8d5c89eec585765126cea58b18b |
|
BLAKE2b-256 | 2d665d881b1d58d317ddd9802fc68841f8bfe49a10a55f785432300ae2ae8963 |