rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.
Project description
RT-SEG — Reasoning Trace Segmentation
rt_seg is a Python 3.12.x package for segmenting reasoning traces into coherent chunks and (optionally) assigning a label to each chunk.
The main entry point is:
RTSeg
(from rt_segmentation.seg_factory)
It orchestrates one or more segmentation engines and — if multiple engines are used — an offset aligner that fuses their boundaries into a single segmentation.
https://github.com/user-attachments/assets/d38d8e4f-ab49-4c16-a9eb-ab7d789e64ce
Installation
Install from PyPI (once published)
pip install rt-seg
Development Install (repo checkout)
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Install TUI APP with Docker/Podman
Note: requires NVIDIA GPU (12.4.1+).
docker build -f docker/Dockerfile -t mytui:gpu .
docker run -it --rm --gpus all mytui:gpu
# podman build -f docker/Dockerfile -t mytui:gpu .
# podman run -it --rm --device nvidia.com/gpu=all mytui:gpu
Core Concepts
What RTSeg Returns
RTSeg(trace) produces:
offsets:list[tuple[int, int]]— character offsets into the tracelabels:list[str]— one label per segment
You can reconstruct segments via:
segments = [trace[s:e] for (s, e) in offsets]
Segmentation Base Unit (seg_base_unit)
Most engines operate on a base segmentation first:
"clause"(default) → finer granularity"sent"→ coarser segmentation
Quickstart — Single Engine
from rt_seg import RTSeg
from rt_seg import RTRuleRegex
trace = "First step... Then second step... Finally conclude."
segmentor = RTSeg(
engines=RTRuleRegex,
seg_base_unit="clause",
)
offsets, labels = segmentor(trace)
for (s, e), label in zip(offsets, labels):
print(label, "=>", trace[s:e])
Multiple Engines + Late Fusion
If you pass multiple engines, you must provide an aligner.
from rt_seg import RTSeg
from rt_seg import RTRuleRegex
from rt_seg import RTBERTopicSegmentation
from rt_seg import OffsetFusionGraph
segmentor = RTSeg(
engines=[RTRuleRegex, RTBERTopicSegmentation],
aligner=OffsetFusionGraph,
label_fusion_type="concat", # or "majority"
seg_base_unit="clause",
)
offsets, labels = segmentor(trace)
Label Fusion Modes
"majority"— choose most frequent label"concat"— concatenate labels (useful for debugging)
Available Engines
Rule-Based
RTRuleRegexRTNewLine
Probabilistic
RTLLMForcedDecoderBasedRTLLMSurprisalRTLLMEntropyRTLLMTopKShiftRTLLMFlatnessBreak
LLM Discourse / Reasoning Schemas
RTLLMThoughtAnchorRTLLMReasoningFlowRTLLMArgument
LLM
RTLLMOffsetBasedRTLLMSegUnitBased
PRM-Based
RTPRMBase
Topic / Semantic / NLI
RTBERTopicSegmentationRTEmbeddingBasedSemanticShiftRTEntailmentBasedSegmentationRTZeroShotSeqClassificationRTZeroShotSeqClassificationRFRTZeroShotSeqClassificationTA
Engine Configuration
You can override engine parameters at call time:
offsets, labels = segmentor(
trace,
model_name="Qwen/Qwen2.5-7B-Instruct",
chunk_size=200,
)
Available Aligners
OffsetFusionGraphOffsetFusionFuzzyOffsetFusionIntersectOffsetFusionMergeOffsetFusionVoting
| Strategy | Behavior |
|---|---|
| Intersect | Conservative |
| Merge | Permissive |
| Voting / Graph / Fuzzy | Balanced (recommended) |
Implementing a Custom Engine
from typing import Tuple, List
from rt_seg import SegBase
class MyEngine(SegBase):
@staticmethod
def _segment(trace: str, **kwargs) -> Tuple[List[tuple[int, int]], List[str]]:
offsets = [(0, len(trace))]
labels = ["UNK"]
return offsets, labels
Using Base Offsets
base_offsets = SegBase.get_base_offsets(trace, seg_base_unit="clause")
Implementing a Custom Aligner
from typing import List, Tuple
class MyOffsetFusion:
@staticmethod
def fuse(engine_offsets: List[List[Tuple[int, int]]], **kwargs):
return engine_offsets[0]
Running the TUI (Without Docker)
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m tui
If needed:
python src/tui.py
SurrealDB (Optional — Reproducible Experiments)
Required only for full experiment pipeline.
1️⃣ Start SurrealDB (Docker Recommended)
docker run --rm -it \
-p 8000:8000 \
-v "$(pwd)/data:/data" \
surrealdb/surrealdb:latest \
start --user root --pass root file:/data/surreal.db
Endpoints:
- WebSocket:
ws://127.0.0.1:8000/rpc - HTTP:
http://127.0.0.1:8000
2️⃣ Import Database Snapshot
surreal import \
--endpoint ws://127.0.0.1:8000/rpc \
--username root \
--password root \
--namespace NR \
--database RT \
./data/YOUR_EXPORT_FILE.surql
⚠️ Make sure namespace/database match your config.
3️⃣ Configure data/sdb_login.json
{
"user": "root",
"pwd": "root",
"ns": "NR",
"db": "RT",
"url": "ws://127.0.0.1:8000/rpc"
}
4️⃣ Run Experiment Scripts
python src/eval_main.py
python src/evo.py
Docker + GPU Setup
Requirements
- Linux
- NVIDIA GPU
- NVIDIA driver
- Docker
- NVIDIA Container Toolkit
Verify:
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
CUDA Compatibility Rule
Host driver CUDA ≥ Container CUDA
| Host | Container | Result |
|---|---|---|
| 12.8 | 12.4 | ✅ |
| 12.8 | 13.1 | ❌ |
| 13.x | 12.4 | ✅ |
Build Image
docker build -f docker/Dockerfile -t rt-seg:gpu .
Run
./run_tui_app_docker.sh
Internally:
docker run -it --rm --gpus all rt-seg:gpu
Summary
RT-SEG provides:
- Modular segmentation engines
- Late fusion strategies
- LLM-based reasoning segmentation
- Reproducible DB-backed experiments
- GPU Docker deployment
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rt_seg-1.1.0.tar.gz.
File metadata
- Download URL: rt_seg-1.1.0.tar.gz
- Upload date:
- Size: 13.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7e47f9b20beb684d3de1edad0e3f331e634d584fa74350bc6c479b5da824224
|
|
| MD5 |
5b9821ba62e9e18524ec1c4b3443c24a
|
|
| BLAKE2b-256 |
7a793e2342111c1de511d4a8b0abed855a29cea67e2ca1a90e32f6255f171c01
|
Provenance
The following attestation bundles were made for rt_seg-1.1.0.tar.gz:
Publisher:
publish.yml on rtseg/RT-SEG
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rt_seg-1.1.0.tar.gz -
Subject digest:
b7e47f9b20beb684d3de1edad0e3f331e634d584fa74350bc6c479b5da824224 - Sigstore transparency entry: 1411236282
- Sigstore integration time:
-
Permalink:
rtseg/RT-SEG@3fa586646979646a45ce4d26f3d6abcf4bd76dfa -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/rtseg
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3fa586646979646a45ce4d26f3d6abcf4bd76dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file rt_seg-1.1.0-py3-none-any.whl.
File metadata
- Download URL: rt_seg-1.1.0-py3-none-any.whl
- Upload date:
- Size: 60.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ced20dde3fc990f559c233169b5b4b9830bf37ccd7329588637571c97d66ada8
|
|
| MD5 |
2e0a33662e65af51858f7b58b5a297c5
|
|
| BLAKE2b-256 |
b1a592d218becbcc5c3c4b778e369544d60f503ee8a1502f111d154105324d25
|
Provenance
The following attestation bundles were made for rt_seg-1.1.0-py3-none-any.whl:
Publisher:
publish.yml on rtseg/RT-SEG
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rt_seg-1.1.0-py3-none-any.whl -
Subject digest:
ced20dde3fc990f559c233169b5b4b9830bf37ccd7329588637571c97d66ada8 - Sigstore transparency entry: 1411236377
- Sigstore integration time:
-
Permalink:
rtseg/RT-SEG@3fa586646979646a45ce4d26f3d6abcf4bd76dfa -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/rtseg
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3fa586646979646a45ce4d26f3d6abcf4bd76dfa -
Trigger Event:
push
-
Statement type: