Skip to main content

rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.

Project description

RT-SEG — Reasoning Trace Segmentation

rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.

The main entry point is:

RTSeg

(from rt_segmentation.seg_factory)

It orchestrates one or more segmentation engines and — if multiple engines are used — an offset aligner that fuses their boundaries into a single segmentation.


https://github.com/user-attachments/assets/d38d8e4f-ab49-4c16-a9eb-ab7d789e64ce


Installation

Install from PyPI (once published)

pip install rt-seg

Development Install (repo checkout)

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install TUI APP with Docker/Podman

Note: requires NVIDIA GPU (12.4.1+).

docker build -f docker/Dockerfile -t mytui:gpu .
docker run -it --rm --gpus all mytui:gpu

# podman build -f docker/Dockerfile -t mytui:gpu .
# podman run -it --rm --device nvidia.com/gpu=all mytui:gpu

Core Concepts

What RTSeg Returns

RTSeg(trace) produces:

  • offsets: list[tuple[int, int]] — character offsets into the trace
  • labels: list[str] — one label per segment

You can reconstruct segments via:

segments = [trace[s:e] for (s, e) in offsets]

Segmentation Base Unit (seg_base_unit)

Most engines operate on a base segmentation first:

  • "clause" (default) → finer granularity
  • "sent" → coarser segmentation

Quickstart — Single Engine

from rt_seg import RTSeg
from rt_seg import RTRuleRegex

trace = "First step... Then second step... Finally conclude."

segmentor = RTSeg(
    engines=RTRuleRegex,
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)

for (s, e), label in zip(offsets, labels):
    print(label, "=>", trace[s:e])

Multiple Engines + Late Fusion

If you pass multiple engines, you must provide an aligner.

from rt_seg import RTSeg
from rt_seg import RTRuleRegex
from rt_seg import RTBERTopicSegmentation
from rt_seg import OffsetFusionGraph

segmentor = RTSeg(
    engines=[RTRuleRegex, RTBERTopicSegmentation],
    aligner=OffsetFusionGraph,
    label_fusion_type="concat",  # or "majority"
    seg_base_unit="clause",
)

offsets, labels = segmentor(trace)

Label Fusion Modes

  • "majority" — choose most frequent label
  • "concat" — concatenate labels (useful for debugging)

Available Engines

Rule-Based

  • RTRuleRegex
  • RTNewLine

Probabilistic

  • RTLLMForcedDecoderBased
  • RTLLMSurprisal
  • RTLLMEntropy
  • RTLLMTopKShift
  • RTLLMFlatnessBreak

LLM Discourse / Reasoning Schemas

  • RTLLMThoughtAnchor
  • RTLLMReasoningFlow
  • RTLLMArgument

LLM

  • RTLLMOffsetBased
  • RTLLMSegUnitBased

PRM-Based

  • RTPRMBase

Topic / Semantic / NLI

  • RTBERTopicSegmentation
  • RTEmbeddingBasedSemanticShift
  • RTEntailmentBasedSegmentation
  • RTZeroShotSeqClassification
  • RTZeroShotSeqClassificationRF
  • RTZeroShotSeqClassificationTA

Engine Configuration

You can override engine parameters at call time:

offsets, labels = segmentor(
    trace,
    model_name="Qwen/Qwen2.5-7B-Instruct",
    chunk_size=200,
)

Available Aligners

  • OffsetFusionGraph
  • OffsetFusionFuzzy
  • OffsetFusionIntersect
  • OffsetFusionMerge
  • OffsetFusionVoting
Strategy Behavior
Intersect Conservative
Merge Permissive
Voting / Graph / Fuzzy Balanced (recommended)

Implementing a Custom Engine

from typing import Tuple, List
from rt_seg import SegBase

class MyEngine(SegBase):
    @staticmethod
    def _segment(trace: str, **kwargs) -> Tuple[List[tuple[int, int]], List[str]]:
        offsets = [(0, len(trace))]
        labels = ["UNK"]
        return offsets, labels

Using Base Offsets

base_offsets = SegBase.get_base_offsets(trace, seg_base_unit="clause")

Implementing a Custom Aligner

from typing import List, Tuple

class MyOffsetFusion:
    @staticmethod
    def fuse(engine_offsets: List[List[Tuple[int, int]]], **kwargs):
        return engine_offsets[0]

Running the TUI (Without Docker)

python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m tui

If needed:

python src/tui.py

SurrealDB (Optional — Reproducible Experiments)

Required only for full experiment pipeline.


1️⃣ Start SurrealDB (Docker Recommended)

docker run --rm -it \
  -p 8000:8000 \
  -v "$(pwd)/data:/data" \
  surrealdb/surrealdb:latest \
  start --user root --pass root file:/data/surreal.db

Endpoints:

  • WebSocket: ws://127.0.0.1:8000/rpc
  • HTTP: http://127.0.0.1:8000

2️⃣ Import Database Snapshot

surreal import \
  --endpoint ws://127.0.0.1:8000/rpc \
  --username root \
  --password root \
  --namespace NR \
  --database RT \
  ./data/YOUR_EXPORT_FILE.surql

⚠️ Make sure namespace/database match your config.


3️⃣ Configure data/sdb_login.json

{
  "user": "root",
  "pwd": "root",
  "ns": "NR",
  "db": "RT",
  "url": "ws://127.0.0.1:8000/rpc"
}

4️⃣ Run Experiment Scripts

python src/eval_main.py
python src/evo.py

Docker + GPU Setup

Requirements

  • Linux
  • NVIDIA GPU
  • NVIDIA driver
  • Docker
  • NVIDIA Container Toolkit

Verify:

nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

CUDA Compatibility Rule

Host driver CUDA ≥ Container CUDA

Host Container Result
12.8 12.4
12.8 13.1
13.x 12.4

Build Image

docker build -f docker/Dockerfile -t rt-seg:gpu .

Run

./run_tui_app_docker.sh

Internally:

docker run -it --rm --gpus all rt-seg:gpu

Summary

RT-SEG provides:

  • Modular segmentation engines
  • Late fusion strategies
  • LLM-based reasoning segmentation
  • Reproducible DB-backed experiments
  • GPU Docker deployment

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rt_seg-1.1.0.tar.gz (13.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rt_seg-1.1.0-py3-none-any.whl (60.9 kB view details)

Uploaded Python 3

File details

Details for the file rt_seg-1.1.0.tar.gz.

File metadata

  • Download URL: rt_seg-1.1.0.tar.gz
  • Upload date:
  • Size: 13.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rt_seg-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b7e47f9b20beb684d3de1edad0e3f331e634d584fa74350bc6c479b5da824224
MD5 5b9821ba62e9e18524ec1c4b3443c24a
BLAKE2b-256 7a793e2342111c1de511d4a8b0abed855a29cea67e2ca1a90e32f6255f171c01

See more details on using hashes here.

Provenance

The following attestation bundles were made for rt_seg-1.1.0.tar.gz:

Publisher: publish.yml on rtseg/RT-SEG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rt_seg-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: rt_seg-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 60.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rt_seg-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ced20dde3fc990f559c233169b5b4b9830bf37ccd7329588637571c97d66ada8
MD5 2e0a33662e65af51858f7b58b5a297c5
BLAKE2b-256 b1a592d218becbcc5c3c4b778e369544d60f503ee8a1502f111d154105324d25

See more details on using hashes here.

Provenance

The following attestation bundles were made for rt_seg-1.1.0-py3-none-any.whl:

Publisher: publish.yml on rtseg/RT-SEG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page