LLM-driven dialogue segmentation and episodic memory construction pipeline.
Project description
Dialogue Memory Pipeline
dialogue_memory_pipeline turns a dialogue transcript into:
- candidate topic-shift boundaries
- per-utterance local discourse states
- finalized dialogue segments
- episode-style memory records for each segment
The package is built around an OpenAI-compatible JSON LLM client and is intended for dialogue understanding, segmentation, and memory construction workflows.
Project Status
This package is currently in active development and should be treated as an alpha release.
- APIs, behavior, and output formats may still change.
- The package is not yet production-ready.
- At the moment, the only provider setup that is tested and supported is Bailian (DashScope) using its OpenAI-compatible endpoint.
If you publish this package, it is best to assume early-adopter usage rather than stable general availability.
What It Does
Given a sequence of utterances, the pipeline runs four stages:
- Candidate boundary generation Scores every possible boundary between adjacent utterances and keeps only the highest-confidence candidates.
- Local state extraction Extracts a structured state for each utterance, including topic, intent, entities, cue markers, and obligation signals.
- Transition judgment and segmentation Walks candidate boundaries in order and decides whether each one starts a new segment.
- Episodic memory building Produces one memory record per final segment.
The top-level entrypoint is DialogueSegmentationPipeline.
Features
- End-to-end dialogue segmentation and memory generation
- OpenAI-compatible client with optional custom
base_url - Structured JSON outputs for every pipeline stage
- Configurable candidate selection thresholding
- Simple API for loading dialogues from JSON files
Installation
Install the package from source:
git clone https://github.com/Keyan0412/dialogue_memory_pipeline.git
cd dialogue_memory_pipeline
python -m venv .venv
source .venv/bin/activate
pip install .
Required dependencies:
openaipython-dotenv
Environment Variables
The pipeline currently expects a Bailian (DashScope) API key and endpoint through an OpenAI-compatible interface.
Supported environment variables:
OPENAI_API_KEYOPENAI_BASE_URL(optional)OPENAI_MODEL(optional when usingfrom_env)
Example .env:
OPENAI_API_KEY=YOUR_API_KEY
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
OPENAI_MODEL=qwen3.5-plus
At this stage, other OpenAI-compatible providers may or may not work, but they are not yet officially supported by this package.
Quick Start
Use the pipeline with environment variables
from dialogue_memory_pipeline import (
DialogueSegmentationPipeline,
PipelineConfig,
load_sample_dialogue,
)
dialogue = load_sample_dialogue()
config = PipelineConfig(
top_p_candidates=0.30,
min_candidate_score=0.20,
right_preview_window=3,
min_segment_len=2,
)
pipeline = DialogueSegmentationPipeline.from_env(config=config)
result = pipeline.run(dialogue)
print(result["segments"])
print(result["episodes"])
Use the pipeline with explicit credentials
from dialogue_memory_pipeline import DialogueSegmentationPipeline, load_sample_dialogue
dialogue = load_sample_dialogue()
pipeline = DialogueSegmentationPipeline.from_openai(
model="qwen3.5-plus",
api_key="YOUR_API_KEY",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
result = pipeline.run(dialogue)
Public API
The package exports:
from dialogue_memory_pipeline import (
DialogueSegmentationPipeline,
PipelineConfig,
load_dialogue,
load_sample_dialogue,
)
DialogueSegmentationPipeline
Constructors:
DialogueSegmentationPipeline(llm, config=None)DialogueSegmentationPipeline.from_env(model=None, config=None)DialogueSegmentationPipeline.from_openai(model, api_key=None, base_url=None, config=None)
Main method:
run(utterances) -> dict
PipelineConfig
Current configuration fields:
PipelineConfig(
top_p_candidates=0.30,
min_candidate_score=0.20,
right_preview_window=3,
min_segment_len=2,
)
Field meanings:
top_p_candidates: Fraction of available boundaries to keep after scoring. If a dialogue hasNutterances, there areN - 1possible boundaries. The retained candidate count isceil((N - 1) * top_p_candidates), with a minimum of 1 whenever any boundary exists.min_candidate_score: Minimum boundary score to keep before the top-p cap is applied.right_preview_window: Number of right-side local states shown to the transition judge when evaluating a candidate split.min_segment_len: Minimum allowed segment length used during segmentation and cleanup merge.
Input Format
load_sample_dialogue() loads the packaged example dialogue shipped in the wheel.
load_dialogue(...) expects a JSON file containing a list of utterances:
[
{
"turn_id": 0,
"speaker": "user",
"text": "I need to reschedule my flight."
},
{
"turn_id": 1,
"speaker": "assistant",
"text": "Sure, what is your booking number?"
}
]
Each item must contain:
turn_idspeakertext
Output Structure
pipeline.run(...) returns a dictionary with these top-level keys:
candidateslocal_statesdecisionssegmentsepisodestiming
candidates
Each candidate includes:
boundary_after_turnscoreleft_turn_idright_turn_idleft_textright_textreasoningsource
local_states
Each local state includes:
turn_idspeakersummary_topicintentsalient_entitiescue_markersobligation.opensobligation.resolves
segments
Each finalized segment includes:
segment_idutterance_spanutteranceslocal_statessegment_state
segment_state contains:
stable_topicdiscourse_goalfocus_topicsentity_coreopen_obligationsdominant_relation
episodes
Each episodic memory record includes:
episode_idutterance_spanutterancesretrieval_summarykey_entitiesimportance
Running the Included Scripts
Full pipeline demo
python scripts/demo.py
Optional flags:
python scripts/demo.py --output outputs/demo_output.json --model qwen3.5-plus
Candidate boundary generator test
python scripts/test_candidate_generator.py --top-p 0.30 --min-score 0.40
This writes a JSON report with:
- all scored boundaries
- filtered candidate boundaries
- the effective candidate-generation config
Repository Layout
src/dialogue_memory_pipeline/Importable packagesrc/dialogue_memory_pipeline/clients/LLM client adapterssrc/dialogue_memory_pipeline/core/Shared dataclasses and schemassrc/dialogue_memory_pipeline/modules/Pipeline modules for boundary generation, state extraction, transition judgment, and memory buildingsrc/dialogue_memory_pipeline/data/Packaged sample dialogue data included in the wheelexamples/Small usage examplesscripts/Runnable scripts for demos and module-level testingtests/Local test coverage for packaging and defensive parsing behavioroutputs/Generated artifacts
Model and Provider Notes
- The current tested provider is Bailian (DashScope) via its OpenAI-compatible endpoint.
OPENAI_BASE_URLshould currently point to the Bailian compatible endpoint unless you are experimenting on your own.from_env()defaults toOPENAI_MODELwhen set, otherwise it falls back toqwen3.5-plus.- Support for additional providers is not finalized yet.
Current Limitations
- The project is still in alpha and may change in breaking ways.
- Only Bailian (DashScope) API credentials and endpoint configuration are currently supported.
- The implementation is fully LLM-driven; there is no local fallback model path in the package.
- Transition-judge behavior is model-dependent because split decisions are generated by the LLM.
- The implementation depends on an OpenAI-compatible JSON-capable model endpoint.
License
This project is released under the Apache License 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dialogue_memory_pipeline-0.1.0.tar.gz.
File metadata
- Download URL: dialogue_memory_pipeline-0.1.0.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95ccf2e55479bedd7a5cc00d1ba7dde8050fee71a8cf861b9e04b406aeb8d1e9
|
|
| MD5 |
7c5d46e2f04cfad739af128ac7d61381
|
|
| BLAKE2b-256 |
c5ec5ed2cc22e5ea3d16e98947f4d3eb2ee761b1e669e3120853188ea3d7e745
|
File details
Details for the file dialogue_memory_pipeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dialogue_memory_pipeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eede7b05704c495001a06bc69f2b7bce7a2d23a7931f76623dbf5d95d13992df
|
|
| MD5 |
f14612f9cafc9322df6d9597effda549
|
|
| BLAKE2b-256 |
fff9bb0b26b9c8401de6e1361d1efbf4b204414d09df67b34f96316aa827e4c1
|