Baseline training algorithm for the DRAGON Challenge

These details have not been verified by PyPI

Project links

Project description

DRAGON Baseline Algorithm

This repository provides the baseline code for the DRAGON Challenge.

If you are using DRAGON resources, please cite the following article:

J. S. Bosma, K. Dercksen, L. Builtjes, R. André, C, Roest, S. J. Fransen, C. R. Noordman, M. Navarro-Padilla, J. Lefkes, N. Alves, M. J. J. de Grauw, L. van Eekelen, J. M. A. Spronck, M. Schuurmans, A. Saha, J. J. Twilt, W. Aswolinskiy, W. Hendrix, B. de Wilde, D. Geijs, J. Veltman, D. Yakar, M. de Rooij, F. Ciompi, A. Hering, J. Geerdink, and H. Huisman on behalf of the DRAGON consortium. The DRAGON benchmark for clinical NLP. npj Digital Medicine 8, 289 (2025). https://doi.org/10.1038/s41746-025-01626-x

Download the citation file for your reference manager: BibTeX | RIS

Installation instructions

We strongly recommend that you install the DRAGON baseline in a virtual environment! Pip or anaconda are both fine. Use a recent version of Python! 3.9 or newer is guaranteed to work!

pip install dragon_baseline

How to get started as AI developer?

Please see the dedicated development guide to get started with creating new solutions!

What is the DRAGON benchmark?

The DRAGON benchmark serves as an extensive resource for testing and advancing clinical NLP algorithms, particularly in the realm of automated data curation. In the context of medical imaging datasets, data curation involves selecting relevant studies, collecting key measurements, and determining the clinical outcomes as labels. Clinical reports are the primary source for these curation tasks. The DRAGON benchmark aims to catalyze the development of algorithms capable of addressing a broad spectrum of data curation tasks and introduces 28 clinically relevant tasks, as detailed here.

Accessing the reports in the DRAGON benchmark

All data, including clinical reports and associated labels, are stored in a sequestered manner on the Grand Challenge platform. This prevents users from directly accessing or viewing the data, preserving patient privacy by design. While participants cannot directly download or see the data, they do have full functional access for model training and validation through the platform interface. Keeping the test labels hidden helps to mitigate potential biases. To aid the development of solutions we provide synthetic datasets (see the development guide) for all task types and provide an example case for each of the tasks in the DRAGON manuscript and here.

What can the DRAGON algorithm do for you?

If you are a domain scientist (radiologist, pathologist, ...) looking to automate your own data curation, the DRAGON algorithm provides an out-of-the-box solution that is all but guaranteed to provide excellent results on your individual dataset. Simply convert your dataset into the DRAGON format and enjoy the power of AI - no expertise required!

If you are an AI researcher developing NLP methods, DRAGON:

offers a fantastic out-of-the-box applicable baseline algorithm to compete against
can act as a method development framework to test your contribution on a large number of datasets without having to tune individual pipelines (for example evaluating a new loss function)
provides a strong starting point for further dataset-specific optimizations. This is particularly used when competing in NLP challenges
provides a new perspective on the design of NLP methods: maybe you can find better connections between dataset properties and best-fitting NLP pipelines?

What is the scope of the DRAGON challenge?

The DRAGON benchmark focusses on clinical NLP with "closed questions" (see the eight task types at the top). This means that generative models are out of the scope for the DRAGON challenge.

DRAGON relies on supervised learning, which means that you need to provide training cases for your application. The number of required training cases varies heavily depending on the complexity of the problem. No one-fits-all number can be provided here!

How does the DRAGON baseline work?

Given a new dataset, DRAGON will systematically analyze the provided training cases and create a 'dataset fingerprint'.

DRAGON configures its pipeline based on a two-step recipe:

Fixed parameters are not adapted. During development of the DRAGON baseline we identified a robust configuration (that is, certain architecture and training properties) that can simply be used all the time. This includes, for example, the loss function and learning rate.
Rule-based parameters use the dataset fingerprint to adapt certain segmentation pipeline properties by following hard-coded heuristic rules. For example, the regression target is transformed with a logarithmic function when the skew of the label distribution is more than one.

How to get started to use existing DRAGON algorithms?

If you want to use an existing algorithm to annotate new data, you need:

Manually annotated training data (provided by you)
Manually annotated validation data (provided by you)
The data you want to annotate (we call this "test data" because in the context of the benchmark, the algorithm will provide predictions for the test data)

First, prepare the data in the correct dataset convention: please see the dataset conversion for more information.

You can use the algorithm on Grand Challenge or locally. Either way, the algorithm will fit the model to your training data, have it select the model checkpoint based on your validation data, and then produce the model predictions for the "test data". To use the algorithm on Grand Challenge, navigate to the leaderboard and select the algorithm you want [PENDING].

If you prefer to perform these steps on your own hardware, please follow the steps in "How to get started as AI developer?" to learn how to set this up. You can find the GitHub repository of submissions on the leaderboard under the GitHub icon [PENDING].

Bringing in your own data

To format your own dataset for usage with the algorithms from the DRAGON challenge, check out the dataset convention.

Where does the DRAGON baseline perform well and where does it not perform?

The DRAGON baseline methods have been evaluated on the DRAGON test set, as prespecified in the DRAGON Statistical Analysis Plan.

Model performance for the best model (RoBERTa large with domain-specific pretraining) was excellent for 10/28 tasks (T1, T2, T4, T5, T10, T15, T16, T19, T20, T21), good for 8/28 tasks (T3, T7, T11, T22, T23, T25, T26, T27), moderate for 6/28 tasks (T9, T13, T17, T18, T24, T28) and poor for 4/28 tasks (T6, T8, T12, T14), based on pre-defined performance thresholds.

For the full performance breakdown of each model, see the evaluation results.

Managed By

Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands

Contact Information

Joeran Bosma: Joeran.Bosma@radboudumc.nl

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.6

Jun 6, 2025

0.4.5

Apr 18, 2025

0.4.4

Apr 9, 2025

0.4.3

Mar 28, 2025

0.4.2

Mar 28, 2025

0.4.1

Feb 18, 2025

0.4.0

Feb 17, 2025

0.3.0

Oct 8, 2024

0.2.3

Jun 20, 2024

0.2.2

May 29, 2024

0.2.1

May 8, 2024

0.2.0

May 3, 2024

0.1

Nov 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dragon_baseline-0.4.6.tar.gz (51.6 kB view details)

Uploaded Jun 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dragon_baseline-0.4.6-py3-none-any.whl (56.9 kB view details)

Uploaded Jun 6, 2025 Python 3

File details

Details for the file dragon_baseline-0.4.6.tar.gz.

File metadata

Download URL: dragon_baseline-0.4.6.tar.gz
Upload date: Jun 6, 2025
Size: 51.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for dragon_baseline-0.4.6.tar.gz
Algorithm	Hash digest
SHA256	`488f6adf6662532ddc794764ba7758f79965aece5c71ef87cc45e570bbd75ef9`
MD5	`97f723a9836d29fbffd7eae27a51b5a4`
BLAKE2b-256	`82fb927e7e27df4976cb3c68607f503567c1e86cdeef13b3e3bd36999bc4b1d4`

See more details on using hashes here.

File details

Details for the file dragon_baseline-0.4.6-py3-none-any.whl.

File metadata

Download URL: dragon_baseline-0.4.6-py3-none-any.whl
Upload date: Jun 6, 2025
Size: 56.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for dragon_baseline-0.4.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86e0c681d68c9c562d831db694c0752737a02a97ed4a212ae57fb4dee4d91ffa`
MD5	`f09ea35cbc790179e9efdd8cd8f7c222`
BLAKE2b-256	`c7c49704b57eb85fb678e45b7f3f88e37bf85b41b99ac1410db43d6cdae6bb4b`

See more details on using hashes here.

dragon-baseline 0.4.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DRAGON Baseline Algorithm

Installation instructions

How to get started as AI developer?

What is the DRAGON benchmark?

Accessing the reports in the DRAGON benchmark

What can the DRAGON algorithm do for you?

What is the scope of the DRAGON challenge?

How does the DRAGON baseline work?

How to get started to use existing DRAGON algorithms?

Bringing in your own data

Where does the DRAGON baseline perform well and where does it not perform?

Managed By

Contact Information

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes