IsaNLP RST Parser: A library for parsing Rhetorical Structure Theory trees.

Project description

Python

IsaNLP RST Parser

This library provides several versions of the Rhetorical Structure (RST) parser for multiple languages. Below, you will find instructions on how to set up and run the parser either locally or using Docker.

Performance
Installation & Quick Start
Visualizing the RST Tree
Advanced Usage
Docker Setup
Citation

Performance

The parser achieves strong end-to-end performance across various standard RST corpora.

Supported languages (all): English (eng), Czech (ces), German (deu), Basque (eus), Persian (fas), French (fra), Dutch (nld), Brazilian Portuguese (por), Russian (rus), Spanish (spa), and Chinese (zho).

Click to view detailed end-to-end performance metrics

Tag / Version	Languages	Train Data	Test Data	Seg	S	N	R	Full
`rstdt`	eng	eng.rst.rstdt	eng.rst.rstdt	97.8	75.6	65.0	55.6	53.9
`gumrrg`	eng, rus	eng.erst.gum, rus.rst.rrg	eng.erst.gum	95.5	67.4	56.2	49.6	48.7
			rus.rst.rrg	97.0	67.1	54.6	46.5	45.4
`rstreebank`	rus	rus.rrt	rus.rst.rrt	92.1	66.2	53.1	46.1	46.2
`unirst`	all	all	ces.rst.crdt	94.5	59.1	41.2	28.6	28.0
			deu.rst.pcc	96.5	67.3	47.4	34.1	32.1
			eng.erst.gum	95.3	67.3	55.6	48.5	47.4
			eng.rst.oll	92.5	55.7	39.0	27.5	26.3
			eng.rst.rstdt	98.1	76.7	65.5	55.2	53.6
			eng.rst.sts	91.2	43.3	31.3	19.4	18.7
			eng.rst.umuc	88.8	52.6	40.6	26.2	25.8
			eus.rst.ert	92.5	66.0	50.3	34.9	34.7
			fas.rst.prstc	94.7	63.0	50.2	40.8	40.7
			fra.sdrt.annodis	91.3	58.6	48.9	30.6	30.3
			nld.rst.nldt	98.0	61.8	49.8	36.8	35.8
			por.rst.cstn	93.9	68.4	52.8	44.9	44.5
			rus.rst.rrg	96.4	67.4	54.0	46.3	45.1
			rus.rst.rrt	90.7	63.0	49.0	42.3	42.2
			spa.rst.rststb	93.4	63.5	50.3	36.0	36.0
			spa.rst.sctb	85.5	55.1	46.8	39.1	39.1
			zho.rst.gcdt	93.0	64.5	50.7	45.9	44.6
			zho.rst.sctb	95.4	67.5	51.5	39.9	39.9

Installation & Quick Start

This guide covers the most common use case: running the parser locally.

1. Installation

Install isanlp from GitHub and isanlp_rst from PyPI:

pip uninstall isanlp -y && pip install git+https://github.com/iinemo/isanlp.git
pip install isanlp_rst

2. Basic Usage

The following example initializes and runs the parser.

from isanlp_rst.parser import Parser

# Define the version of the model you want to use
version = 'gumrrg'  # Choose from {'gumrrg', 'rstdt', 'rstreebank'}

# Initialize the parser
parser = Parser(hf_model_name='tchewik/isanlp_rst_v3', 
                hf_model_version=version, 
                cuda_device=0) # Use -1 for CPU

text = """
On Saturday, in the ninth edition of the T20 Men's Cricket World Cup, Team India won against South Africa by seven runs. 
The final match was played at the Kensington Oval Stadium in Barbados. This marks India's second win in the T20 World Cup, 
which was co-hosted by the West Indies and the USA between June 2 and June 29.

After winning the toss, India decided to bat first and scored 176 runs for the loss of seven wickets. 
Virat Kohli top-scored with 76 runs, followed by Axar Patel with 47 runs. Hardik Pandya took three wickets, 
and Jasprit Bumrah took two wickets.
"""

# Parse the text to obtain the RST tree
res = parser(text) # res['rst'] contains the binary discourse tree

# Inspect the structure of the root node
print(vars(res['rst'][0]))

To use the multilingual UniRST model, you can specify the required relation inventory with relinventory='lang.code.dataset', as listed in the UniRST performance table. The default inventory for UniRST is eng.rst.rstdt.

parser = Parser(hf_model_name='tchewik/isanlp_rst_v3',
                hf_model_version='unirst',
                cuda_device=0,
                relinventory='eng.erst.gum')

3. Understanding the Output

The parser returns an RST tree with a recursive structure. Each node (Discourse Unit) contains:

{
 'id': 21,
 'left': (id=14, start=1, end=323),  # Left child node
 'right': (id=20, start=324, end=570), # Right child node
 'relation': 'elaboration',           # Rhetorical relation
 'nuclearity': 'NS',                 # Nucleus-Satellite status
 'entropy': 0.92,                    # Entropy of the split
 'start': 1,                         # Start character offset
 'end': 570,                         # End character offset
 'text': "On Saturday, ... took two wickets."
}

Visualizing the RST Tree

You can easily visualize the output in several ways.

1. Save to RS3 Format

First, export the parsed tree to the standard .rs3 format.

res['rst'][0].to_rs3('filename.rs3')

You can open filename.rs3 in external tools like RSTTool or rstWeb for editing.

2. View in Jupyter / Colab

Render the tree directly in your notebook.

import io, contextlib
import isanlp_rst

# Suppress the HTML string from being printed
buf = io.StringIO()
with contextlib.redirect_stdout(buf):
    isanlp_rst.render("filename.rs3")

# If you’re in Google Colab, use colab=True to sync the cell height
# isanlp_rst.render("filename.rs3", colab=True)

Illustration of the parsing visualization

3. Export to PNG or PDF

To export the visualization, you'll first need to install Playwright:

pip install playwright
playwright install chromium

Then, you can export the .rs3 file:

import isanlp_rst

# Export to PNG
isanlp_rst.to_png("filename.rs3", "filename.png")

# Export to PDF
isanlp_rst.to_pdf("filename.rs3", "filename.pdf")

Advanced Usage

Parsing Pre-Segmented EDUs

You can pass custom segments instead of raw text:

my_edus = [
    "On Saturday, Team India won against South Africa.",
    "The final match was played in Barbados."
]

res = parser.from_edus(my_edus)

Memory Management for Large Datasets

When parsing many documents, the resulting DiscourseUnit trees can consume significant memory, as each node stores its corresponding text span.

You can use res['rst'][0].clear_textfields() to recursively remove all text from the tree, leaving only the structure (IDs, relations, and character offsets). This makes the tree object lightweight for storage (e.g., pickling).

Later, you can use .fill_textfields(full_text) to repopulate the tree using the original text.

Important: Do not use the .to_rs3() method on a tree with cleared text fields.

Docker Setup

You can run the parser as a service using Docker. This is currently available for tags: rstdt, gumrrg, rstreebank.

Run the Docker container:

# Pull and run the 'rstreebank' model on port 3335
docker run --rm -p 3335:3333 --name rst_rrt tchewik/isanlp_rst:3.0-rstreebank

Connect with the isanlp client: (Note: isanlp_rst is not required for the client)

pip install git+https://github.com/iinemo/isanlp.git

from isanlp import PipelineCommon
from isanlp.processor_remote import ProcessorRemote

# Connect to the running container
address_rst = ('127.0.0.1', 3335)

ppl = PipelineCommon([
    (ProcessorRemote(address_rst[0], address_rst[1], 'default'),
     ['text'],
     {'rst': 'rst'})
])

res = ppl(text)
# res['rst'] will contain the binary discourse tree

Citation

If you use the IsaNLP RST Parser in your research, please cite our work:

For rstdt, gumrrg, and rstreebank models:

@inproceedings{chistova-2024-bilingual,
 title = "Bilingual Rhetorical Structure Parsing with Large Parallel Annotations",
 author = "Chistova, Elena",
 booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
 month = aug,
 year = "2024",
 address = "Bangkok, Thailand and virtual meeting",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2024.findings-acl.577",
 pages = "9689--9706"
}

For the unirst model:

@inproceedings{chistova-2025-bridging,
  title = "Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser",
  author = "Chistova, Elena",
  booktitle = "Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)",
  month = nov,
  year = "2025",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.codi-1.17/",
  pages = "197--208"
 }

Project details

Release history Release notifications | RSS feed

This version

3.2.0

Nov 7, 2025

3.1.3

Nov 5, 2025

3.1.2

Oct 25, 2025

3.1.1

Oct 10, 2025

3.1.1a1 pre-release

Oct 25, 2025

3.1.1a0 pre-release

Oct 21, 2025

3.1.0

Sep 26, 2025

3.0.1a5 pre-release

Aug 26, 2024

3.0.1a4 pre-release

Aug 25, 2024

3.0.1a0 pre-release

Aug 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isanlp_rst-3.2.0.tar.gz (328.1 kB view details)

Uploaded Nov 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isanlp_rst-3.2.0-py3-none-any.whl (347.1 kB view details)

Uploaded Nov 7, 2025 Python 3

File details

Details for the file isanlp_rst-3.2.0.tar.gz.

File metadata

Download URL: isanlp_rst-3.2.0.tar.gz
Upload date: Nov 7, 2025
Size: 328.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for isanlp_rst-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5235fa3370f3a03cb8f8d3d6afcfc64d3bfe44776f45f70e3cf98be766e90df5`
MD5	`b60239202a1470a1ce70a881bdcc2012`
BLAKE2b-256	`c8d7c583312d17c0ab8622e29d6be6d6f41b8587308d5fad12afa7e094396dcd`

See more details on using hashes here.

File details

Details for the file isanlp_rst-3.2.0-py3-none-any.whl.

File metadata

Download URL: isanlp_rst-3.2.0-py3-none-any.whl
Upload date: Nov 7, 2025
Size: 347.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for isanlp_rst-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9fe5e63041734e244e78c526532054b5492013d051c3231e3049216c5731250`
MD5	`2387328a5e39cdb32d4ed2a0af67b218`
BLAKE2b-256	`2ff33d147ab1ecfb5d48f7420a8a0d849d19bf012d5bea460034625e074e4500`

See more details on using hashes here.

isanlp-rst 3.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

IsaNLP RST Parser

Table of Contents

Performance

Installation & Quick Start

1. Installation

2. Basic Usage

3. Understanding the Output

Visualizing the RST Tree

1. Save to RS3 Format

2. View in Jupyter / Colab

3. Export to PNG or PDF

Advanced Usage

Parsing Pre-Segmented EDUs

Memory Management for Large Datasets

Docker Setup

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes