Skip to main content

A collection of discourse parsers and associated code focused on readability and usability.

Project description

IUDEX

The Indiana University Discourse Exhibition (IUDEX) is a collection of parsers and other code related to discourse parsing.

Setup

For the latest release:

pip install larc-iudex

For the current state of master:

pip install git+https://github.com/larc-iu/iudex

Or for development:

git clone https://github.com/larc-iu/iudex && cd iudex
pip install -e .

Note that the command you will invoke is iudex, not larc-iudex.

Quick Start with Inference

Parse a sample document end-to-end with a pretrained DMRST model pulled from the HuggingFace Hub. From the command line:

iudex dmrst predict \
    --hub-id larc-iu/dmrst-gum-12.1.0 \
    --text "Although the experiment was carefully designed, the results were inconclusive. We plan to repeat it tonight."

This yields the parsed tree in .rs3 format printed to stdout:

<rst>
  <relations><!-- ... --></relations>
  <body>
    <segment id="1" parent="2" relname="adversative-concession">Although the experiment was carefully # designed,</segment>
    <segment id="2" parent="4" relname="span">the results were inconclusive.</segment>
    <segment id="3" parent="5" relname="span">We plan to repeat it tonight.</segment>
    <group id="4" type="span" parent="3" relname="adversative-antithesis"/>
    <group id="5" type="span"/>
  </body>
</rst>

The same flow from Python:

from iudex.rst.parsers.dmrst.modeling_dmrst import DMRSTParser
parser = DMRSTParser.from_pretrained("larc-iu/dmrst-gum-12.1.0")
tree = parser.predict_from_text(
    "Although the experiment was carefully designed, "
    "the results were inconclusive. "
    "We plan to repeat it tonight."
)
print(tree.to_rs4_string())

Yields:

<rst>
  <relations><!-- ... --></relations>
  <body>
    <segment id="1" parent="2" relname="adversative-concession">Although the experiment was carefully # designed,</segment>
    <segment id="2" parent="4" relname="span">the results were inconclusive.</segment>
    <segment id="3" parent="5" relname="span">We plan to repeat it tonight.</segment>
    <group id="4" type="span" parent="3" relname="adversative-antithesis"/>
    <group id="5" type="span"/>
  </body>
</rst>

Inference CLI

To identify a model on the command line, you may use a configuration file (--config), a PyTorch checkpoint (--checkpoint), or a HuggingFace Hub repository (--hub-id).

To provide input, you may specify an inline string (--text), a path to a raw text file or directory (--text-file, for parsers which support this), or an RS3/RS4 file or directory with gold EDUs already supplied (--input).

For --text-file and --input, results are written to --output-dir as .rs4 files.

# From the Hub, end-to-end on a directory of .txt files:
iudex dmrst predict \
    --hub-id larc-iu/dmrst-gum-12.1.0 \
    --text-file path/to/docs/ \
    --output-dir out/ \
    --device cuda

# From an explicit checkpoint:
iudex dmrst predict \
    --checkpoint checkpoints/<run_id>/best_model.pt \
    --text-file path/to/doc.txt \
    --output-dir out/

# From a trained run's config, parsing pre-segmented RS3/RS4 with gold EDUs:
iudex topdown_biaffine predict \
    --config configs/topdown_biaffine_rstdt.jsonnet \
    --input data/rstdt/test \
    --output-dir out/

Available Models

All official IUDEX model releases are tagged with iudex on the HuggingFace Hub.

Training

To train a new top-down biaffine parser on RSTDT:

iudex topdown_biaffine train configs/topdown_biaffine_rstdt.jsonnet

Note that configs/topdown_biaffine_rstdt.jsonnet is a configuration. You may either edit it directly or copy and modify it in a new location.

Grabbing Example Configurations

Model configurations required for training are not bundled with the package distributed via PyPI.

To get them you may visit the associated directory and download the configurations you're interested in manually.

If you want to grab all of them at once, you can use the command line like so:

bash / zsh / macOS / Linux:

curl -fL https://github.com/larc-iu/iudex/archive/refs/heads/master.tar.gz \
  | tar -xz --strip-components=1 --wildcards '*/configs'

Windows PowerShell:

Invoke-WebRequest https://github.com/larc-iu/iudex/archive/refs/heads/master.zip -OutFile iudex.zip
Expand-Archive iudex.zip -DestinationPath .
Move-Item iudex-master/configs configs
Remove-Item -Recurse -Force iudex-master, iudex.zip

Either leaves you with a local configs/ directory you can edit and pass to iudex … train configs/<name>.jsonnet.

Configuration Hashes

Your configuration is used as the basis for a unique hash, which (by default) corresponds to a directory under checkpoints/. This hash is used for several purposes. For example, running the same config again resumes from the last epoch's checkpoint last.pt automatically if the run was interrupted.

To view all runs and their status, you may run the runs list subcommand:

$ iudex runs list
                                                            Runs in checkpoints                                                            
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ run_id       ┃ run_name ┃ parser           ┃ model_name                   ┃ train_dir             ┃  best_val ┃ step ┃ modified         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ 245b1d774676 │ -        │ dmrst            │ xlm-roberta-base             │ data/gum_12.1.0/train │    0.3099 │ 1704 │ 2026-05-18 18:02 │
│ 41bc0fe1dd50 │ -        │ topdown_biaffine │ SpanBERT/spanbert-base-cased │ data/rstdt/train      │    0.7576 │ 2149 │ 2026-05-18 13:51 │
│ 91525e48d63d │ -        │ topdown_biaffine │ SpanBERT/spanbert-base-cased │ data/gum_12.1.0/train │    0.6364 │ 1899 │ 2026-05-18 14:31 │
│ ad934ca992d4 │ -        │ dmrst            │ xlm-roberta-base             │ data/rstdt/train      │    0.4665 │ 3090 │ 2026-05-18 16:46 │
└──────────────┴──────────┴──────────────────┴──────────────────────────────┴───────────────────────┴───────────┴──────┴──────────────────┘

Monitoring with TensorBoard

Every run writes TensorBoard scalars (train loss, learning rate, gradient norm, and dev metrics) to <run_dir>/tb/. Point TensorBoard at your checkpoints directory to watch any run live or compare runs:

tensorboard --logdir checkpoints/

Pushing Models to HF Hub

You may host a trained model using each parser's push subcommand. Each uploads best_model.pt, config.json, and an auto-generated README.md in a single commit:

iudex topdown_biaffine push \
    --config configs/topdown_biaffine_rstdt.jsonnet \
    --repo-id larc-iu/topdown_biaffine-rstdt-coarse \
    [--private] [--message "..."] [--token $HF_TOKEN]

Citation

If you use IUDEX in your research, please cite it as:

Gessler, Luke. 2026. IUDEX: The Indiana University Discourse Exhibition. https://github.com/larc-iu/iudex.

BibTeX:

@misc{gessler-iudex-2026,
  author       = {Gessler, Luke},
  title        = {{IUDEX: The Indiana University Discourse Exhibition}},
  year         = {2026},
  howpublished = {\url{https://github.com/larc-iu/iudex}},
}

If you use one of the included parser re-implementations, please also cite the original paper (see each model's Hub card for the canonical reference).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larc_iudex-0.1.0a7.tar.gz (90.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

larc_iudex-0.1.0a7-py3-none-any.whl (101.5 kB view details)

Uploaded Python 3

File details

Details for the file larc_iudex-0.1.0a7.tar.gz.

File metadata

  • Download URL: larc_iudex-0.1.0a7.tar.gz
  • Upload date:
  • Size: 90.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for larc_iudex-0.1.0a7.tar.gz
Algorithm Hash digest
SHA256 ef937a684b0cdbe1e4e23b17bc9811680fceeec4a2f38141ab5b5f5487d94ced
MD5 1c191edc966705327a0323c757f7b3e0
BLAKE2b-256 a9f37b6802bd4a428861bc017607d9d12fe3de060e61f4b41c72811d85943c7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for larc_iudex-0.1.0a7.tar.gz:

Publisher: publish.yml on larc-iu/iudex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file larc_iudex-0.1.0a7-py3-none-any.whl.

File metadata

  • Download URL: larc_iudex-0.1.0a7-py3-none-any.whl
  • Upload date:
  • Size: 101.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for larc_iudex-0.1.0a7-py3-none-any.whl
Algorithm Hash digest
SHA256 7f1a80d5cf63a28a63b0ea43b9fdb85a83c43f1b366ca6a45c7825bc4dd5182c
MD5 6ec0e8a915631029127e96433cfd718a
BLAKE2b-256 e91898748e3b16154e498364c89b2ca0f9957d76e24ee4f88d14cc9bd219bbcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for larc_iudex-0.1.0a7-py3-none-any.whl:

Publisher: publish.yml on larc-iu/iudex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page