Preprocessing scripts for the DRAGON benchmark
Project description
DRAGON Preprocessing
This repository contains the preprocessing scripts for the DRAGON challenge.
If you are using this codebase or some part of it, please cite the following article: PENDING
BibTeX:
PENDING
Installation
dragon_prep
can be pip-installed:
pip install dragon_prep
Alternatively, it can be installed from source:
git clone https://github.com/DIAGNijmegen/dragon_prep
cd dragon_prep
pip install -e .
The Docker can be built after cloning the repository. The anonymisation code is not included due to privacy concerns, so you have to uncomment copying and installing the diag-radiology-report-anonymizer. The unmodified version is included to reflect the exact code used to prepare the DRAGON challenge resources.
git clone https://github.com/DIAGNijmegen/dragon_prep
cd dragon_prep
nano Dockerfile # uncomment copying and installing the diag-radiology-report-anonymizer
./build.sh
If ran successfully, this results in the Docker container named dragon_prep:latest
.
Resources
The preprocessing scripts for the synthetic datasets can be found in src/dragon_prep
and are the script called Task1xx_Example_yy.py
. The preprocessing scripts for the datasets used in the test leaderboard for the DRAGON challenge can be found in src/dragon_prep
and are the script called Task0xx_yy.py
. The datasets for the validation leaderboard are derived from the development data, using the src/dragon_prep/make_debug_splits.py
script. For the DRAGON challenge, all datasets were preprocessed using the preprocess.sh
script.
Usage
The synthetic datasets can be generated with any number of samples.
After installing the dragon_prep
module:
python src/dragon_prep/Task101_Example_sl_bin_clf.py \
--output_dir=./output \
--num_examples={set any number you like}
Or, using the Docker container:
docker run --rm -it \
-v /path/to/store/data:/output \
dragon_prep:latest python /opt/app/dragon_prep/src/dragon_prep/Task101_Example_sl_bin_clf.py \
--num_examples={set any number you like}
# ... same for Task102_Example_sl_mc_clf.py, Task104_Example_ml_bin_clf.py, Task105_Example_ml_mc_clf.py, Task106_Example_sl_reg.py, Task107_Example_ml_reg.py, Task108_Example_sl_ner.py, Task109_Example_ml_ner.py
# for Task103_Example_mednli.py, setting the number of examples is not supported
The preprocessing scripts for the tasks in the DRAGON benchmark are included for transparancy and to provide building blocks to process your own data. To run the end-to-end script using your own data, you can turn off the anonymisation functionality:
prepare_for_anon(df=df, output_dir=output_dir, task_name=task_name, tag_phi=False, apply_hips=False)
Managed By
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
Contact Information
Joeran Bosma: Joeran.Bosma@radboudumc.nl
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dragon_prep-0.2.6.tar.gz
.
File metadata
- Download URL: dragon_prep-0.2.6.tar.gz
- Upload date:
- Size: 53.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7daa820c972deffb301a54e3eda7a99fd9789464f7bed19e2d521284115f9eb |
|
MD5 | 64124aa50fc3a5fd70f709712961f937 |
|
BLAKE2b-256 | 8a153f226382ea5aa2d55ebfd51973ca8cec7aac8a761360246e1f66755a5675 |
File details
Details for the file dragon_prep-0.2.6-py3-none-any.whl
.
File metadata
- Download URL: dragon_prep-0.2.6-py3-none-any.whl
- Upload date:
- Size: 111.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a3407dab14a5e0cc33a1fbd7f6d7c0093a72ee1d9dd3a016c5e3cdc9000bc8e |
|
MD5 | cccebc51cc87ac8e6cbae65ae4f5eb0d |
|
BLAKE2b-256 | 91312e77f6fe5285c6827fcc6511113125796e626d0d60de527dfa30d423773f |