Skip to main content

A collection of discourse parsers and associated code focused on readability and usability.

Project description

IUDEX

The Indiana University Discourse EXhibition (IUDEX) is a collection of parsers and other code related to discourse parsing.

Setup

For the latest release:

pip install larc-iudex

For the current state of master:

pip install git+https://github.com/larc-iu/iudex

Or for development:

git clone https://github.com/larc-iu/iudex && cd iudex
pip install -e .

Note that the command you will invoke is iudex, not larc-iudex.

Grabbing Example Configurations

Model configurations required for training are not bundled with the package distributed via PyPI.

To get them you may visit the associated directory and download the configurations you're interested in manually.

If you want to grab all of them at once, you can use the command line like so:

bash / zsh / macOS / Linux:

curl -fL https://github.com/larc-iu/iudex/archive/refs/heads/master.tar.gz \
  | tar -xz --strip-components=1 --wildcards '*/configs'

Windows PowerShell:

Invoke-WebRequest https://github.com/larc-iu/iudex/archive/refs/heads/master.zip -OutFile iudex.zip
Expand-Archive iudex.zip -DestinationPath .
Move-Item iudex-master/configs configs
Remove-Item -Recurse -Force iudex-master, iudex.zip

Either leaves you with a local configs/ directory you can edit and pass to iudex … train configs/<name>.jsonnet.

Quick Start

Training

To train a new top-down biaffine parser on RSTDT:

iudex topdown_biaffine train configs/topdown_biaffine_rstdt.jsonnet

Note that configs/topdown_biaffine_rstdt.jsonnet is a configuration. You may either edit it directly or copy and modify it in a new location.

Configuration Hashes

Your configuration is used as the basis for a unique hash, which (by default) corresponds to a directory under checkpoints/. This hash is used for several purposes. For example, running the same config again resumes from the last epoch's checkpoint last.pt automatically if the run was interrupted.

To view all runs and their status, you may run the runs list subcommand:

$ iudex runs list
                                                            Runs in checkpoints                                                            
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ run_id       ┃ run_name ┃ parser           ┃ model_name                   ┃ train_dir             ┃  best_val ┃ step ┃ modified         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ 007f69457a0c │ -        │ dmrst            │ xlm-roberta-base             │ data/rstdt/train      │ (no best) │    - │ 2026-05-13 16:49 │
│ f4f2bbc875b6 │ -        │ dmrst            │ xlm-roberta-base             │ data/rstdt/train      │ (no best) │    - │ 2026-05-13 16:26 │
└──────────────┴──────────┴──────────────────┴──────────────────────────────┴───────────────────────┴───────────┴──────┴──────────────────┘

Inference

To identify a model, you may use a configuration file (--config), a PyTorch checkpoint (--checkpoint), or a HuggingFace Hub repository (--hub-id).

To provide input, you may specify either an RS3/RS4 input file (--input) with gold EDUs already supplied, or (for parsers which support this) a plain text file (--input-text). Both arguments also support directories containing files of the appropriate type. Examples:

# From a trained run, parsing pre-segmented RS3/RS4:
iudex topdown_biaffine predict \
    --config configs/topdown_biaffine_rstdt.jsonnet \
    --input data/rstdt/test \
    --output-dir out/

# From an explicit checkpoint, end-to-end on raw text:
iudex topdown_biaffine predict \
    --checkpoint checkpoints/<run_id>/best_model.pt \
    --input-text path/to/doc.txt \
    --output-dir out/

# From the Hub:
iudex topdown_biaffine predict \
    --hub-id larc-iu/topdown_biaffine-rstdt-coarse \
    --input-text path/to/doc.txt \
    --output-dir out/ \
    --device cuda

Pushing Models to HF Hub

You may host a trained model using each parser's push subcommand. Each uploads best_model.pt, config.json, and an auto-generated README.md in a single commit:

iudex topdown_biaffine push \
    --config configs/topdown_biaffine_rstdt.jsonnet \
    --repo-id larc-iu/topdown_biaffine-rstdt-coarse \
    [--private] [--message "..."] [--token $HF_TOKEN]

Programmatic API

Beyond the CLI, you may also use IUDEX as a library:

from iudex.rst.parsers.topdown_biaffine import TopdownBiaffineParser

parser = TopdownBiaffineParser.from_pretrained("larc-iu/topdown_biaffine-rstdt-coarse")
tree = parser.predict_from_text("Your document text here. Multiple sentences are fine.")
print(tree.to_rs4_string())

from_pretrained accepts a Hub repo id, a local run directory, or a .pt path. Optional kwargs include device, revision, cache_dir, token.

Citation

If you use IUDEX in your research, please cite it as:

Gessler, Luke. 2026. IUDEX: The Indiana University Discourse Exhibition. https://github.com/larc-iu/iudex.

BibTeX:

@misc{gessler-iudex-2026,
  author       = {Gessler, Luke},
  title        = {{IUDEX: The Indiana University Discourse Exhibition}},
  year         = {2026},
  howpublished = {\url{https://github.com/larc-iu/iudex}},
}

If you use one of the included parser re-implementations, please also cite the original paper (see each model's Hub card for the canonical reference).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

larc_iudex-0.1.0a1.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

larc_iudex-0.1.0a1-py3-none-any.whl (68.2 kB view details)

Uploaded Python 3

File details

Details for the file larc_iudex-0.1.0a1.tar.gz.

File metadata

  • Download URL: larc_iudex-0.1.0a1.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for larc_iudex-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 d35402e99594de98d3efab7c466b0be1801244b954bfe682b9b1e97a7056f264
MD5 2f931d55911184791f5b29fe74bf3d8d
BLAKE2b-256 b4c5a3771cb32cd62fd28d0a0eb9592a72d19e8fb30e501be3eb7e2a6f762d7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for larc_iudex-0.1.0a1.tar.gz:

Publisher: publish.yml on larc-iu/iudex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file larc_iudex-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: larc_iudex-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 68.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for larc_iudex-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 cc090bb491a45fd3ffe6e93cef4a4c384c71db489d71547887cff453a49339d0
MD5 baf0e5c8a4997f9be52bd9475cf30482
BLAKE2b-256 91d925b1bee08a2c27127f32caf22aae63bbeed3a94e6fb3ea3d3f380da28464

See more details on using hashes here.

Provenance

The following attestation bundles were made for larc_iudex-0.1.0a1-py3-none-any.whl:

Publisher: publish.yml on larc-iu/iudex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page