Skip to main content

Preprocessing scripts for the DRAGON benchmark

Project description

DRAGON Preprocessing

This repository contains the preprocessing scripts for the DRAGON challenge.

If you are using this codebase or some part of it, please cite the following article: PENDING

BibTeX:

PENDING

Installation

dragon_prep can be pip-installed:

pip install dragon_prep

Alternatively, it can be installed from source:

git clone https://github.com/DIAGNijmegen/dragon_prep
cd dragon_prep
pip install -e .

The Docker can be built after cloning the repository. The anonymisation code is not included due to privacy concerns, so you have to uncomment copying and installing the diag-radiology-report-anonymizer. The unmodified version is included to reflect the exact code used to prepare the DRAGON challenge resources.

git clone https://github.com/DIAGNijmegen/dragon_prep
cd dragon_prep
nano Dockerfile  # uncomment copying and installing the diag-radiology-report-anonymizer
./build.sh

If ran successfully, this results in the Docker container named dragon_prep:latest.

Resources

The preprocessing scripts for the synthetic datasets can be found in src/dragon_prep and are the script called Task1xx_Example_yy.py. The preprocessing scripts for the datasets used in the test leaderboard for the DRAGON challenge can be found in src/dragon_prep and are the script called Task0xx_yy.py. The datasets for the validation leaderboard are derived from the development data, using the src/dragon_prep/make_debug_splits.py script. For the DRAGON challenge, all datasets were preprocessed using the preprocess.sh script.

Usage

The synthetic datasets can be generated with any number of samples.

After installing the dragon_prep module:

python src/dragon_prep/Task101_Example_sl_bin_clf.py \
    --output_dir=./output \
    --num_examples={set any number you like}

Or, using the Docker container:

docker run --rm -it \
    -v /path/to/store/data:/output \
    dragon_prep:latest python /opt/app/dragon_prep/src/dragon_prep/Task101_Example_sl_bin_clf.py \
        --num_examples={set any number you like}


# ... same for Task102_Example_sl_mc_clf.py, Task104_Example_ml_bin_clf.py, Task105_Example_ml_mc_clf.py, Task106_Example_sl_reg.py, Task107_Example_ml_reg.py, Task108_Example_sl_ner.py, Task109_Example_ml_ner.py
# for Task103_Example_mednli.py, setting the number of examples is not supported

The preprocessing scripts for the tasks in the DRAGON benchmark are included for transparancy and to provide building blocks to process your own data. To run the end-to-end script using your own data, you can turn off the anonymisation functionality:

prepare_for_anon(df=df, output_dir=output_dir, task_name=task_name, tag_phi=False, apply_hips=False)

Managed By

Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands

Contact Information

Joeran Bosma: Joeran.Bosma@radboudumc.nl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dragon_prep-0.2.9.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dragon_prep-0.2.9-py3-none-any.whl (113.8 kB view details)

Uploaded Python 3

File details

Details for the file dragon_prep-0.2.9.tar.gz.

File metadata

  • Download URL: dragon_prep-0.2.9.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for dragon_prep-0.2.9.tar.gz
Algorithm Hash digest
SHA256 8d4e8d9d7987a288187429decea02dd8e4d1fbb6fd5355a8e5d3d858ba2234e1
MD5 33f5953c4d6f2871afd79a3deed0087f
BLAKE2b-256 e7ba1ae05136bc87f2e79aa1ac0e97093ad53d87e03d2974ccbf7e2e750dc83f

See more details on using hashes here.

File details

Details for the file dragon_prep-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: dragon_prep-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 113.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for dragon_prep-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e55b8be6277bf565d668c2ef5b6f8800a5a5c59d8ad5e655e254d8b636dfbfe3
MD5 c5b9af24b491f3506cbc10c11d7466e8
BLAKE2b-256 25d3fe8210de293b44121032d7b66e4b1f30ac6e8c8829a3496c9a61a6f83005

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page