Skip to main content

Synthetic Dialogue Generation and Analysis

Project description

SDialog Logo

Documentation Status CI codecov PyPI version Downloads License: MIT Open In Colab


SDialog is a modular Python toolkit for synthetic dialog generation, evaluation, and analysis. It standardizes a Dialog schema and offers persona‑driven multi‑agent simulation with LLMs, composable orchestration, built‑in metrics, and mechanistic interpretability—so you can generate reliable, controllable dialog data at scale.

Quick links: DocsAPIDemo (Colab)TutorialsIssues

✨ Key features

  • Standard Dialog schema with JSON import/export (aiming to help standardize dialog datasets with community support)
  • Persona‑driven multi‑agent simulation with contexts, tools, and thoughts
  • Composable orchestration for precise control over behavior and flow
  • Built‑in evaluation (metrics + LLM‑as‑judge) for comparison and iteration
  • Native mechanistic interpretability (inspect and steer activations)
  • Easy creation of user-defined components by inheriting from base classes (personas, metrics, orchestrators, etc.)
  • Interoperability across OpenAI, HuggingFace, Ollama, AWS, and more

If you are building controlled conversational simulations, benchmarking dialog models, producing synthetic training corpora, or probing internal model behavior, SDialog provides an end-to-end workflow.

⚡ Installation

pip install sdialog

🏁 Quickstart: 60‑second tour

Short example showing personas, agents, a simple rule (orchestrator), and a tool.

import sdialog
from sdialog import Context
from sdialog.agents import Agent
from sdialog.personas import Persona
from sdialog.orchestrators import SimpleReflexOrchestrator

# Set your preferred backend/model and parameters
sdialog.config.llm("openai:gpt-4.1", temperature=0.9)

# Define personas and shared context
alice = Persona(name="Alice", role="barista", personality="cheerful")
bob   = Persona(name="Bob", role="customer", personality="curious")
ctx = Context(location="Downtown cafe", topics=["coffee"]) 

# (Optional) Define tools for the agents
# Just any user-defined function, let's define a mock one for our agent
def lookup_menu(item: str) -> dict:
    return {"item": item, "specials": ["vanilla latte", "cold brew"]}

# (Optional) Define orchestrators for the agents
# Let's define a simple rule-based orchestrator
react = SimpleReflexOrchestrator(
    condition=lambda utt: "decaf" in utt.lower(),
    instruction="Explain decaf options and suggest one."
)

# Create the agents
barista = Agent(persona=alice, tools=[lookup_menu])
customer = Agent(persona=bob, first_utterance="Hi!")

# Add orchestrators to your agent using pipe-like composition
barista = barista | react

# Generate three dialogs!
for ix in range(3):
    dialog = customer.dialog_with(barista, context=ctx)
    dialog.print(orchestration=True)
    dialog.to_file(f"dialog_{ix}.json")

[!NOTE]

Load a saved dialog later:

from sdialog import Dialog
my_dialog = Dialog.from_file("dialog_0.json")
my_dialog.print()

Generate personas and contexts for your agents automatically when you need diversity, and use the .set() method when you need more control:

from sdialog.personas import Doctor, Patient
from sdialog.generators import PersonaGenerator, ContextGenerator
from sdialog import Context

# By default, all attribute values will be LLM generated.
doc = PersonaGenerator(Doctor(specialty="Cardiology")).generate()
pat = PersonaGenerator(Patient(symptoms="chest pain")).generate()

# Alternatively, specify how you want each attribute to be generated
ctx_base = Context(location="emergency room")
ctx_gen = ContextGenerator(ctx_base)
ctx_gen.set(
    objects=get_objects_from_db,  # A user-defined function
    circumstances="{csv:circumstances:./data/circumstances.csv}",  # A CSV file
    goals="{llm:Suggest a realistic goal for the context}"  # LLM but with specific instruction, etc.
)
ctx = ctx_gen.generate()

[!TIP] 🕹️ 👉 Check out our demo notebook in Colab to play around with sdialog.

📊 Evaluate and compare

Use built‑in metrics (readability, flow, linguistic features, LLM judges) or easily create new ones, then aggregate and compare datasets via DatasetComparator.

from sdialog.evaluation import LLMJudgeRealDialog, LinguisticFeatureScore
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator
from sdialog.evaluation import DatasetComparator

reference = [...]   # list[Dialog]
candidate = [...]   # list[Dialog]

judge  = LLMJudgeRealDialog()
flesch = LinguisticFeatureScore(feature="flesch-reading-ease")

comparator = DatasetComparator([
  FrequencyEvaluator(judge, name="Realistic dialog rate"),
  MeanEvaluator(flesch, name="Mean Flesch Reading Ease"),
])

results = comparator({"reference": reference, "candidate": candidate})

# Plot results for each evaluator
comparator.plot()

[!TIP] See evaluation tutorial.

🧠 Mechanistic interpretability

Attach Inspectors to capture per‑token activations and optionally steer (add/ablate directions) to analyze or intervene in model behavior.

from sdialog.interpretability import Inspector
from sdialog.agents import Agent

agent = Agent(name="Bob")
inspector = Inspector(target="model.layers.16.post_attention_layernorm")
agent = agent | inspector

agent("How are you?")
agent("Cool!")

# Let's get the last response's first token activation vector!
act = inspector[-1][0].act # [response index][token index]

Steering intervention (subtracting a direction):

anger_direction = torch.load("anger_direction.pt")  # A direction vector (e.g., PCA / difference-in-mean vector)
agent_steered = agent | inspector - anger_direction  # Ablate the anger direction from the target activations

agent_steered("You are an extremely upset assistant")  # Agent "can't get angry anymore" :)

[!TIP] See the tutorial on using SDialog to remove the refusal capability from LLaMA 3.2.

🔧 Interoperability

Many backends supported, just use "BACKEND:MODEL" string format to either set a global default LLM for all components or pass one to each component:

import sdialog

# Change the default global LLM
sdialog.config.llm("ollama:qwen3:14b")
# Any argument supported by the chosen backend/model can also be given, for example
sdialog.config.llm("ollama:qwen3:14b",
                   temperature=0.7,
                   base_url="https://my-ollama-endpoint.com:123")  # Remote Ollama server

Any LLM-powered component can also take a specific model and its parameters as argument, to overwrite the default one:

from sdialog.agents import Agent

my_agent = Agent(model="amazon:anthropic.claude-3-5-sonnet-20240620-v1:0",
                 region_name="us-east-1")

📖 Documentation and tutorials

🤝 Contributing

See CONTRIBUTING.md. We welcome issues, feature requests, and pull requests. If you want to add personas, agents, orchestrators, generators, evaluators, or tutorials, please open an issue or submit a PR, and help us make SDialog better 👍

This project follows the all-contributors specification. All-contributors list:

Sergio Burdisso
Sergio Burdisso

💻 🤔 📖
Labrak Yanis
Labrak Yanis

💻 🤔
Séverin
Séverin

💻 🤔
Ricard Marxer
Ricard Marxer

💻 🤔
Thomas Schaaf
Thomas Schaaf

💻
David Liu
David Liu

💻
ahassoo1
ahassoo1

🤔 💻
Pawel Cyrta
Pawel Cyrta

💻 🤔
ABCDEFGHIJKL
ABCDEFGHIJKL

💻

🙏 Acknowledgments

This work was supported by the European Union Horizon 2020 project ELOQUENCE (grant number 101070558).

The initial development of this project began in preparation for the 2025 Jelinek Memorial Summer Workshop on Speech and Language Technologies (JSALT 2025) as part of the "Play your Part" research group.

📝 License

MIT License
Copyright (c) 2025 Idiap Research Institute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdialog-0.3.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdialog-0.3.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file sdialog-0.3.0.tar.gz.

File metadata

  • Download URL: sdialog-0.3.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.20

File hashes

Hashes for sdialog-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1b26f1075091cb8e22d9c7f33196c17eb11bb383341d746d09db035ca2b4e615
MD5 8e71b96bde943fbeab02fbe6c13ef7f6
BLAKE2b-256 b29fc663bab684fcde135ae0327635a5d99dd90b774b1c6a81315ff4703fc1b4

See more details on using hashes here.

File details

Details for the file sdialog-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sdialog-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.20

File hashes

Hashes for sdialog-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77f5d2b4f30ae0d0e4064be18f768904267c1cb2c4d5b9aba58f6f66798f4c8c
MD5 8afffff26cee49cd4a6d35e36dfdbb28
BLAKE2b-256 6d4f57624859e3c2064581b69a02107ee5462a5a33ecd03b872cb186695dbef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page