Speech Synthesis Markdown (SSMD) is a lightweight alternative syntax for SSML.

These details have not been verified by PyPI

Project links

Homepage

Project description

PyPI - Python Version PyPI - Downloads

SSMD - Speech Synthesis Markdown

SSMD (Speech Synthesis Markdown) is a lightweight Python library that provides a human-friendly markdown-like syntax for creating SSML (Speech Synthesis Markup Language) documents. It's designed to make TTS (Text-to-Speech) content more readable and maintainable.

Features

✨ Markdown-like syntax - More intuitive than raw SSML 🎯 Full SSML support - All major SSML features covered 🔄 Bidirectional - Convert SSMD↔SSML or strip to plain text 📝 Document-centric - Build, edit, and export TTS documents 🎛️ TTS capabilities - Auto-filter features based on engine support 🎨 Extensible - Custom extensions for platform-specific features 🧪 Spec-driven - Follows the official SSMD specification

Installation

pip install ssmd

SSMD includes intelligent sentence detection via phrasplit (regex mode by default - fast and lightweight).

Optional: Enhanced Accuracy with spaCy

For best sentence detection accuracy, especially with complex or informal text, install spaCy support:

pip install "ssmd[spacy]"

# Install language models for the languages you need
python -m spacy download en_core_web_sm  # English (small, ~30MB)
python -m spacy download en_core_web_md  # English (medium, better accuracy, ~100MB)
python -m spacy download en_core_web_lg  # English (large, best accuracy, ~500MB)
python -m spacy download fr_core_news_sm  # French
python -m spacy download de_core_news_sm  # German
python -m spacy download es_core_news_sm  # Spanish
# See https://spacy.io/models for all available models

Performance comparison:

Mode	Speed	Accuracy	Size	Use Case
Regex (default)	~60x faster	~85-90%	0 MB	Simple text, speed-critical
spaCy small models	Baseline	~95%	~30 MB	Balanced accuracy/performance
spaCy large models	Slower	~98%+	~500 MB	Best accuracy, complex text
spaCy transformer	Slowest	~99%+	~1 GB	Research, maximum quality

Without spaCy, SSMD uses fast regex-based sentence splitting that works great for well-formatted text. With spaCy, you get ML-powered detection for complex cases like abbreviations, URLs, and informal writing.

Or install from source:

git clone https://github.com/holgern/ssmd.git
cd ssmd
pip install -e .

Quick Start

Basic Usage

import ssmd

# Convert SSMD to SSML
ssml = ssmd.to_ssml("Hello *world*!")
print(ssml)
# Output: <speak>Hello <emphasis>world</emphasis>!</speak>

# Strip SSMD markup for plain text
plain = ssmd.to_text("Hello *world* @marker!")
print(plain)
# Output: Hello world!

# Convert SSML back to SSMD
ssmd_text = ssmd.from_ssml('<speak><emphasis>Hello</emphasis></speak>')
print(ssmd_text)
# Output: *Hello*

Document API - Build TTS Content Incrementally

from ssmd import Document

# Create a document and build it piece by piece
doc = Document()
doc.add_sentence("Hello and *welcome* to SSMD!")
doc.add_sentence("This is a great tool for TTS.")
doc.add_paragraph("Let's start a new paragraph here.")

# Export to different formats
ssml = doc.to_ssml()      # SSML output
markdown = doc.to_ssmd()  # SSMD markdown
text = doc.to_text()      # Plain text

# Access document content
print(doc.ssmd)           # Raw SSMD content
print(len(doc))           # Number of sentences

TTS Streaming Integration

Perfect for streaming TTS where you process sentences one at a time:

from ssmd import Document

# Create document with configuration
doc = Document(
    config={'auto_sentence_tags': True},
    capabilities='pyttsx3'  # Auto-filter for pyttsx3 support
)

# Build the document
doc.add_paragraph("# Chapter 1: Introduction")
doc.add_sentence("Welcome to the *amazing* world of SSMD!")
doc.add_sentence("This makes TTS content much easier to write.")
doc.add_paragraph("# Chapter 2: Features")
doc.add_sentence("You can use all kinds of markup.")
doc.add_sentence("Including ...500ms pauses and [special pronunciations](ph: speSl).")

# Iterate through sentences for TTS
for i, sentence in enumerate(doc.sentences(), 1):
    print(f"Sentence {i}: {sentence}")
    # Your TTS engine here:
    # tts_engine.speak(sentence)
    # await tts_engine.wait_until_done()

# Or access specific sentences
print(f"Total sentences: {len(doc)}")
print(f"First sentence: {doc[0]}")
print(f"Last sentence: {doc[-1]}")

Document Editing

from ssmd import Document

# Load from existing content
doc = Document("First sentence. Second sentence. Third sentence.")

# Edit like a list
doc[0] = "Modified first sentence."
del doc[1]  # Remove second sentence

# String operations
doc.replace("sentence", "line")

# Merge documents
doc2 = Document("Additional content.")
doc.merge(doc2)

# Split into individual sentences
sentences = doc.split()  # Returns list of Document objects

TTS Engine Capabilities

SSMD can automatically filter SSML features based on your TTS engine's capabilities. This ensures compatibility by stripping unsupported tags to plain text.

Using Presets

from ssmd import Document

# Use a preset for your TTS engine
doc = Document("*Hello* [world](en)!", capabilities='pyttsx3')
ssml = doc.to_ssml()

# pyttsx3 doesn't support emphasis or language tags, so they're stripped:
# <speak>Hello world!</speak>

Available Presets:

minimal - Plain text only (no SSML)
pyttsx3 - Minimal support (basic prosody only)
espeak - Moderate support (breaks, language, prosody, phonemes)
google / azure / microsoft - Full SSML support
polly / amazon - Full support + Amazon extensions (whisper, DRC)
full - All features enabled

Custom Capabilities

from ssmd import Document, TTSCapabilities

# Define exactly what your TTS supports
caps = TTSCapabilities(
    emphasis=False,      # No <emphasis> support
    break_tags=True,     # Supports <break>
    paragraph=True,      # Supports <p>
    language=False,      # No language switching
    prosody=True,        # Supports volume/rate/pitch
    say_as=False,        # No <say-as>
    audio=False,         # No audio files
    mark=False,          # No markers
)

doc = Document("*Hello* world!", capabilities=caps)

Capability-Aware Streaming

from ssmd import Document

# Create document for specific TTS engine
doc = Document(capabilities='espeak')

# Build content with various features
doc.add_paragraph("# Welcome")
doc.add_sentence("*Hello* world!")
doc.add_sentence("[Bonjour](fr) everyone!")

# All sentences are filtered for eSpeak compatibility
for sentence in doc.sentences():
    # Features eSpeak doesn't support are automatically removed
    tts_engine.speak(sentence)

Comparison of Engine Outputs:

Same input: *Hello* world... [this is loud](v: 5)!

Engine	Output
minimal	`<speak>Hello world... this is loud!</speak>`
pyttsx3	`<speak>Hello world... <prosody volume="x-loud">this is loud</prosody>!</speak>`
espeak	`<speak>Hello world<break time="1000ms"/> <prosody volume="x-loud">this is loud</prosody>!</speak>`
google	`<speak><p><emphasis>Hello</emphasis> world<break time="1000ms"/> <prosody volume="x-loud">this is loud</prosody>!</p></speak>`

See examples/tts_with_capabilities.py for a complete demonstration.

SSMD Syntax Reference

Text & Emphasis

SSMD supports all four SSML emphasis levels:

# Moderate emphasis (default)
ssmd.to_ssml("*emphasized text*")
# → <speak><emphasis>emphasized text</emphasis></speak>

# Strong emphasis
ssmd.to_ssml("**very important**")
# → <speak><emphasis level="strong">very important</emphasis></speak>

# Reduced emphasis (subtle)
ssmd.to_ssml("_less important_")
# → <speak><emphasis level="reduced">less important</emphasis></speak>

# No emphasis (explicit, rarely used)
ssmd.to_ssml("[monotone](emphasis: none)")
# → <speak><emphasis level="none">monotone</emphasis></speak>

Breaks & Pauses

# Specific time (required - bare ... is preserved as ellipsis)
ssmd.to_ssml("Hello ...500ms world")
ssmd.to_ssml("Hello ...2s world")
ssmd.to_ssml("Hello ...1s world")

# Strength-based
ssmd.to_ssml("Hello ...n world")  # none
ssmd.to_ssml("Hello ...w world")  # weak (x-weak)
ssmd.to_ssml("Hello ...c world")  # comma (medium)
ssmd.to_ssml("Hello ...s world")  # sentence (strong)
ssmd.to_ssml("Hello ...p world")  # paragraph (x-strong)

Paragraphs

text = """First paragraph here.
Second line of first paragraph.

Second paragraph starts here."""

ssmd.to_ssml(text)
# → <speak><p>First paragraph here.
#    Second line of first paragraph.</p><p>Second paragraph starts here.</p></speak>

Language

# Auto-complete language codes
ssmd.to_ssml('[Bonjour](fr) world')
# → <speak><lang xml:lang="fr-FR">Bonjour</lang> world</speak>

# Explicit locale
ssmd.to_ssml('[Cheerio](en-GB)')
# → <speak><lang xml:lang="en-GB">Cheerio</lang></speak>

Voice Selection

SSMD supports two ways to specify voices: inline annotations for short phrases and block directives for longer passages (ideal for dialogue and scripts).

Inline Voice Annotations

Perfect for short voice changes within a sentence:

# Simple voice name
ssmd.to_ssml('[Hello](voice: Joanna)')
# → <speak><voice name="Joanna">Hello</voice></speak>

# Cloud TTS voice name (e.g., Google Wavenet, AWS Polly)
ssmd.to_ssml('[Hello](voice: en-US-Wavenet-A)')
# → <speak><voice name="en-US-Wavenet-A">Hello</voice></speak>

# Language and gender
ssmd.to_ssml('[Bonjour](voice: fr-FR, gender: female)')
# → <speak><voice language="fr-FR" gender="female">Bonjour</voice></speak>

# All attributes (language, gender, variant)
ssmd.to_ssml('[Text](voice: en-GB, gender: male, variant: 1)')
# → <speak><voice language="en-GB" gender="male" variant="1">Text</voice></speak>

Voice Directives (Block Syntax)

Perfect for dialogue, podcasts, and scripts with multiple speakers:

# Use @voice: name or @voice(name) for clean dialogue formatting
script = """
@voice: af_sarah
Welcome to Tech Talk! I'm Sarah, and today we're diving into the fascinating
world of text-to-speech technology.
...s

@voice: am_michael
And I'm Michael! We've got an amazing episode lined up. The advances in neural
TTS have been incredible lately.
...s

@voice: af_sarah
So what are we covering today?
"""

ssmd.to_ssml(script)
# Each voice directive creates a separate voice block in SSML

Voice directives support all voice attributes:

# Language and gender
multilingual = """
@voice: fr-FR, gender: female
Bonjour! Comment allez-vous aujourd'hui?

@voice: en-GB, gender: male
Hello there! Lovely weather we're having.

@voice: es-ES, gender: female, variant: 1
¡Hola! ¿Cómo estás?
"""

Voice directive features:

Use @voice: name or @voice(name) syntax
Supports all attributes: language, gender, variant
Applies to all text until the next directive or paragraph break
Automatically detected on SSML→SSMD conversion for long voice blocks
Much more readable than inline annotations for dialogue

Mixing both styles:

# Block directive for main speaker, inline for interruptions
text = """
@voice: sarah
Hello everyone, [but wait!](voice: michael) Michael interrupts...

@voice: michael
Sorry, I had to jump in there!
"""

Phonetic Pronunciation

# X-SAMPA notation (converted to IPA automatically)
ssmd.to_ssml('[tomato](ph: t@meItoU)')

# Direct IPA
ssmd.to_ssml('[tomato](ipa: təˈmeɪtoʊ)')

# Output: <speak><phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme></speak>

Prosody (Volume, Rate, Pitch)

Shorthand Notation

# Volume
ssmd.to_ssml("~silent~")      # silent
ssmd.to_ssml("--whisper--")   # x-soft
ssmd.to_ssml("-soft-")        # soft
ssmd.to_ssml("+loud+")        # loud
ssmd.to_ssml("++very loud++") # x-loud

# Rate
ssmd.to_ssml("<<very slow<<")  # x-slow
ssmd.to_ssml("<slow<")         # slow
ssmd.to_ssml(">fast>")         # fast
ssmd.to_ssml(">>very fast>>")  # x-fast

# Pitch
ssmd.to_ssml("__very low__")   # x-low
ssmd.to_ssml("_low_")          # low
ssmd.to_ssml("^high^")         # high
ssmd.to_ssml("^^very high^^")  # x-high

Explicit Notation

# Combined (volume, rate, pitch)
ssmd.to_ssml('[loud and fast](vrp: 555)')
# → <prosody volume="x-loud" rate="x-fast" pitch="x-high">loud and fast</prosody>

# Individual attributes
ssmd.to_ssml('[text](v: 5, r: 3, p: 1)')
# → <prosody volume="x-loud" rate="medium" pitch="x-low">text</prosody>

# Relative values
ssmd.to_ssml('[louder](v: +10dB)')
ssmd.to_ssml('[higher](p: +20%)')

Substitution (Aliases)

ssmd.to_ssml('[H2O](sub: water)')
# → <speak><sub alias="water">H2O</sub></speak>

ssmd.to_ssml('[AWS](sub: Amazon Web Services)')
# → <speak><sub alias="Amazon Web Services">AWS</sub></speak>

Say-As

# Telephone numbers
ssmd.to_ssml('[+1-555-0123](as: telephone)')

# Dates with format
ssmd.to_ssml('[31.12.2024](as: date, format: "dd.mm.yyyy")')

# Say-as with detail attribute (for verbosity control)
ssmd.to_ssml('[123](as: cardinal, detail: 2)')
# → <speak><say-as interpret-as="cardinal" detail="2">123</say-as></speak>

ssmd.to_ssml('[12/31/2024](as: date, format: "mdy", detail: 1)')
# → <speak><say-as interpret-as="date" format="mdy" detail="1">12/31/2024</say-as></speak>

# Spell out
ssmd.to_ssml('[NASA](as: character)')

# Numbers
ssmd.to_ssml('[123](as: cardinal)')
ssmd.to_ssml('[1st](as: ordinal)')

# Expletives (beeped)
ssmd.to_ssml('[damn](as: expletive)')

Audio Files

# Basic audio with description
ssmd.to_ssml('[doorbell](https://example.com/sounds/bell.mp3)')
# → <audio src="https://example.com/sounds/bell.mp3"><desc>doorbell</desc></audio>

# With fallback text
ssmd.to_ssml('[cat purring](cat.ogg Sound file not loaded)')
# → <audio src="cat.ogg"><desc>cat purring</desc>Sound file not loaded</audio>

# No description
ssmd.to_ssml('[](beep.mp3)')
# → <audio src="beep.mp3"></audio>

# Advanced audio attributes
# Clip audio (play from 5s to 30s)
ssmd.to_ssml('[music](song.mp3 clip: 5s-30s)')
# → <audio src="song.mp3" clipBegin="5s" clipEnd="30s"><desc>music</desc></audio>

# Speed control
ssmd.to_ssml('[announcement](speech.mp3 speed: 150%)')
# → <audio src="speech.mp3" speed="150%"><desc>announcement</desc></audio>

# Repeat count
ssmd.to_ssml('[jingle](ad.mp3 repeat: 3)')
# → <audio src="ad.mp3" repeatCount="3"><desc>jingle</desc></audio>

# Volume level
ssmd.to_ssml('[alarm](alert.mp3 level: +6dB)')
# → <audio src="alert.mp3" soundLevel="+6dB"><desc>alarm</desc></audio>

# Combine multiple attributes with fallback text
ssmd.to_ssml('[background](music.mp3 clip: 0s-10s, speed: 120%, level: -3dB Fallback text)')
# → <audio src="music.mp3" clipBegin="0s" clipEnd="10s" speed="120%" soundLevel="-3dB">
#    <desc>background</desc>Fallback text</audio>

Markers

ssmd.to_ssml('I always wanted a @animal cat as a pet.')
# → <speak>I always wanted a <mark name="animal"/> cat as a pet.</speak>

# Markers are removed in plain text (with smart whitespace handling)
ssmd.to_text('word @marker word')
# → "word word" (not "word  word")

Headings

doc = Document(config={
    'heading_levels': {
        1: [('emphasis', 'strong'), ('pause', '300ms')],
        2: [('emphasis', 'moderate'), ('pause', '75ms')],
        3: [('prosody', {'rate': 'slow'}), ('pause', '50ms')],
    }
})

doc.add("""
# Chapter 1
## Section 1.1
### Subsection
""")

ssml = doc.to_ssml()

Extensions (Platform-Specific)

# Amazon Polly whisper effect
ssmd.to_ssml('[whispered text](ext: whisper)')
# → <speak><amazon:effect name="whispered">whispered text</amazon:effect></speak>

# Custom extensions
doc = Document(config={
    'extensions': {
        'custom': lambda text: f'<custom-tag>{text}</custom-tag>'
    }
})

Google Cloud TTS Speaking Styles

Google Cloud TTS supports speaking styles via the google:style extension. You can use SSMD's extension system to add these styles:

from ssmd import Document

# Configure Google TTS styles
doc = Document(config={
    'extensions': {
        'cheerful': lambda text: f'<google:style name="cheerful">{text}</google:style>',
        'calm': lambda text: f'<google:style name="calm">{text}</google:style>',
        'empathetic': lambda text: f'<google:style name="empathetic">{text}</google:style>',
        'apologetic': lambda text: f'<google:style name="apologetic">{text}</google:style>',
        'firm': lambda text: f'<google:style name="firm">{text}</google:style>',
    }
})

# Use styles in your content
doc.add_sentence("[Welcome to our service!](ext: cheerful)")
doc.add_sentence("[We apologize for the inconvenience.](ext: apologetic)")
doc.add_sentence("[Please remain calm.](ext: calm)")

ssml = doc.to_ssml()
# → <speak>
#    <google:style name="cheerful">Welcome to our service!</google:style>
#    <google:style name="apologetic">We apologize for the inconvenience.</google:style>
#    <google:style name="calm">Please remain calm.</google:style>
#    </speak>

Available Google TTS Styles:

cheerful - Upbeat and positive tone
calm - Relaxed and soothing tone
empathetic - Understanding and compassionate tone
apologetic - Sorry and regretful tone
firm - Confident and authoritative tone
news - Professional news anchor tone
conversational - Natural conversation tone

Note: These styles are only supported by specific Google Cloud TTS voices (typically Neural2 and Studio voices). See the Google Cloud TTS documentation for voice compatibility.

For a complete example, see examples/google_tts_styles.py:

python examples/google_tts_styles.py

Parser API - Extract Structured Data

The SSMD parser provides an alternative to SSML generation by extracting structured segments from SSMD text. This is useful when you need programmatic control over SSMD features or want to build custom TTS pipelines.

When to Use the Parser

Custom TTS integration - Process SSMD features programmatically
Text transformations - Handle say-as, substitution, and phoneme conversions
Multi-voice dialogue - Build voice-specific processing pipelines
Feature extraction - Analyze SSMD content without generating SSML

Quick Example

from ssmd import parse_sentences

script = """
@voice: sarah
Hello! Call [+1-555-0123](as: telephone) for info.
[H2O](sub: water) is important.

@voice: michael
Thanks *Sarah*!
"""

# Parse into structured sentences
sentences = parse_sentences(script)

for sentence in sentences:
    # Get voice configuration
    voice_name = sentence.voice.name if sentence.voice else "default"

    # Process each segment
    full_text = ""
    for seg in sentence.segments:
        # Handle text transformations
        if seg.say_as:
            # Your TTS engine converts based on interpret_as
            text = convert_say_as(seg.text, seg.say_as.interpret_as)
        elif seg.substitution:
            # Use substitution text instead of original
            text = seg.substitution
        elif seg.phoneme:
            # Use phoneme for pronunciation
            text = seg.text  # TTS engine handles phoneme
        else:
            text = seg.text

        full_text += text

    # Speak the complete sentence
    tts.speak(full_text, voice=voice_name)

Parser Functions

`parse_sentences(text, **options)` → `list[SSMDSentence]`

Parse SSMD text into structured sentences with segments.

Parameters:

text (str): SSMD text to parse
sentence_detection (bool): Split text into sentences (default: True)
include_default_voice (bool): Include text before first @voice directive (default: True)
capabilities (TTSCapabilities | str): Filter features based on TTS engine support
language (str): Language code for sentence detection (default: "en")
model_size (str): spaCy model size - "sm", "md", "lg", "trf" (default: "sm")
spacy_model (str): Custom spaCy model name (overrides model_size)
use_spacy (bool): If False, use fast regex splitting instead of spaCy (default: True)

Returns: List of SSMDSentence objects

Example:

from ssmd import parse_sentences

# Default: uses small spaCy models (en_core_web_sm)
sentences = parse_sentences("Hello *world*! This is great.")

for sent in sentences:
    print(f"Voice: {sent.voice.name if sent.voice else 'default'}")
    print(f"Segments: {len(sent.segments)}")
    for seg in sent.segments:
        print(f"  - {seg.text!r} (emphasis={seg.emphasis})")

# Fast mode: no spaCy required (uses regex)
sentences = parse_sentences("Hello world. Fast mode.", use_spacy=False)

# High quality: use large spaCy model for better accuracy
sentences = parse_sentences("Complex text here.", model_size="lg")

# Custom model: use domain-specific spaCy model
sentences = parse_sentences("Medical text.", spacy_model="en_core_sci_md")

Sentence Detection Configuration:

SSMD supports flexible sentence detection with quality/speed tradeoffs:

Fast mode (use_spacy=False): Regex-based splitting, no dependencies, ~60x faster
Auto-detect (default): Uses spaCy if installed, falls back to regex
Small models (model_size="sm"): Best balance of speed and accuracy
Medium models (model_size="md"): Better accuracy for complex text
Large models (model_size="lg"): Best accuracy, slower
Transformer models (model_size="trf"): Research-grade accuracy, slowest

The parser works out-of-the-box with fast regex mode. Install ssmd[spacy] and language models for ML-powered accuracy.

Installation note: Larger spaCy models need manual installation:

# First install spaCy support
pip install "ssmd[spacy]"

# Then install models
python -m spacy download en_core_web_md
python -m spacy download fr_core_news_md

# Large models
python -m spacy download en_core_web_lg

# Transformer models
python -m spacy download en_core_web_trf

`parse_segments(text, **options)` → `list[SSMDSegment]`

Parse SSMD text into segments without sentence grouping.

Parameters:

text (str): SSMD text to parse
capabilities (TTSCapabilities | str): Filter features based on TTS engine support

Returns: List of SSMDSegment objects

Example:

from ssmd import parse_segments

segments = parse_segments("Call [+1-555-0123](as: telephone) now")

for seg in segments:
    if seg.say_as:
        print(f"Say-as: {seg.text!r} as {seg.say_as.interpret_as}")

`parse_voice_blocks(text)` → `list[tuple[VoiceAttrs | None, str]]`

Split text by voice directives.

Returns: List of (voice_attrs, text) tuples

Example:

from ssmd import parse_voice_blocks

blocks = parse_voice_blocks("""
@voice: sarah
Hello from Sarah

@voice: michael
Hello from Michael
""")

for voice, text in blocks:
    print(f"{voice.name}: {text.strip()}")

Data Structures

`SSMDSentence`

Represents a complete sentence with voice context.

Attributes:

segments (list[SSMDSegment]): List of text segments
voice (VoiceAttrs | None): Voice configuration
is_paragraph_end (bool): Whether sentence ends a paragraph

`SSMDSegment`

Represents a text segment with metadata.

Attributes:

text (str): The text content
emphasis (bool): Emphasis flag
prosody (ProsodyAttrs | None): Volume, rate, pitch
language (str | None): Language code (e.g., "fr-FR")
breaks_after (list[BreakAttrs]): Pauses after this segment
say_as (SayAsAttrs | None): Say-as interpretation
substitution (str | None): Substitution text
phoneme (str | None): Phonetic pronunciation (IPA)
audio (AudioAttrs | None): Audio file info
marks (list[str]): Marker names

`VoiceAttrs`

Voice configuration attributes.

Attributes:

name (str | None): Voice name (e.g., "sarah", "en-US-Wavenet-A")
language (str | None): Language code (e.g., "en-US")
gender (str | None): Gender ("male", "female", "neutral")
variant (int | None): Voice variant number

`ProsodyAttrs`

Prosody (volume, rate, pitch) attributes.

Attributes:

volume (str | None): Volume level (e.g., "x-loud", "+10dB")
rate (str | None): Speech rate (e.g., "fast", "120%")
pitch (str | None): Pitch level (e.g., "high", "+20%")

`BreakAttrs`

Pause/break attributes.

Attributes:

time (str | None): Break duration (e.g., "500ms", "2s")
strength (str | None): Break strength (e.g., "weak", "strong")

`SayAsAttrs`

Say-as interpretation attributes.

Attributes:

interpret_as (str): Interpretation type (e.g., "telephone", "date")
format (str | None): Format string (e.g., "mdy" for dates)
detail (int | None): Verbosity level (1-2, platform-specific)

`AudioAttrs`

Audio file attributes.

Attributes:

src (str): Audio file URL
alt_text (str | None): Alternative text if audio fails
clip_begin (str | None): Start time for audio clip (e.g., "5s")
clip_end (str | None): End time for audio clip (e.g., "30s")
speed (str | None): Playback speed (e.g., "150%")
repeat_count (int | None): Number of times to repeat
repeat_dur (str | None): Duration to repeat (e.g., "10s")
sound_level (str | None): Volume adjustment (e.g., "+6dB", "-3dB")

Complete Example

See examples/parser_demo.py for a comprehensive demonstration of all parser features:

python examples/parser_demo.py

The demo shows:

Basic segment parsing
Text transformations (say-as, substitution, phoneme)
Voice block handling
Complete TTS workflow with sentence assembly
Prosody and language annotations
Advanced sentence parsing options
Mock TTS integration

API Reference

Module Functions

`ssmd.to_ssml(ssmd_text, **config)` → `str`

Convert SSMD markup to SSML.

Parameters:

ssmd_text (str): SSMD markdown text
**config: Optional configuration parameters

Returns: SSML string

`ssmd.to_text(ssmd_text, **config)` → `str`

Convert SSMD to plain text (strips all markup).

Parameters:

ssmd_text (str): SSMD markdown text
**config: Optional configuration parameters

Returns: Plain text string

`ssmd.from_ssml(ssml_text, **config)` → `str`

Convert SSML to SSMD format.

Parameters:

ssml_text (str): SSML XML string
**config: Optional configuration parameters

Returns: SSMD markdown string

Document Class

`Document(content="", config=None, capabilities=None)`

Main document container for building and managing TTS content.

Parameters:

content (str): Optional initial SSMD content
config (dict): Configuration options
capabilities (TTSCapabilities | str): TTS capabilities preset or object

Building Methods:

add(text) → Add text without separator (returns self for chaining)
add_sentence(text) → Add text with \n separator
add_paragraph(text) → Add text with \n\n separator

Export Methods:

to_ssml() → Export to SSML string
to_ssmd() → Export to SSMD string
to_text() → Export to plain text

Class Methods:

Document.from_ssml(ssml, **config) → Create from SSML
Document.from_text(text, **config) → Create from text

Properties:

ssmd → Raw SSMD content
config → Configuration dict
capabilities → TTS capabilities

List-like Interface:

len(doc) → Number of sentences
doc[i] → Get sentence by index (SSML)
doc[i] = text → Replace sentence
del doc[i] → Delete sentence
doc += text → Append content

Iteration:

sentences() → Iterator yielding SSML sentences
sentences(as_documents=True) → Iterator yielding Document objects

Editing Methods:

insert(index, text, separator="") → Insert text at index
remove(index) → Remove sentence
clear() → Remove all content
replace(old, new, count=-1) → Replace text

Advanced Methods:

merge(other_doc, separator="\n\n") → Merge another document
split() → Split into sentence Documents
get_fragment(index) → Get raw fragment by index

Real-World TTS Example

import asyncio
from ssmd import Document

# Your TTS engine (example with pyttsx3, kokoro-tts, etc.)
class TTSEngine:
    async def speak(self, ssml: str):
        """Speak SSML text."""
        # Implementation depends on your TTS engine
        pass

    async def wait_until_done(self):
        """Wait for speech to complete."""
        pass

async def read_document(content: str, tts: TTSEngine):
    """Read an SSMD document sentence by sentence."""
    doc = Document(content, config={'auto_sentence_tags': True})

    print(f"Reading document with {len(doc)} sentences...")

    for i in range(len(doc)):
        sentence = doc[i]
        print(f"[{i+1}/{len(doc)}] Speaking...")
        await tts.speak(sentence)
        await tts.wait_until_done()

    print("Done!")

# Usage
document = """
# Welcome
Hello and *welcome* to our presentation!
Today we'll discuss some exciting topics.

# Topic 1
First ...500ms let's talk about SSMD.
It makes writing TTS content [much easier](v: 4, p: 4)!

# Conclusion
Thank you for listening @end_marker!
"""

# Run async
# await read_document(document, tts_engine)

Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=ssmd --cov-report=html

# Run specific test file
pytest tests/test_basic.py -v

Code Quality

# Format with ruff
ruff format ssmd/ tests/

# Lint
ruff check ssmd/ tests/

# Type check
mypy ssmd/

Specification

This implementation follows the SSMD Specification with additional features inspired by the JavaScript implementation.

Implemented Features

✅ Text ✅ Emphasis (*text*, **strong**, _reduced_, [text](emphasis: none)) ✅ Break (...500ms, ...2s, ...n/w/c/s/p) ✅ Language ([text](en), [text](en-GB)) ✅ Voice inline ([text](voice: Joanna), [text](voice: en-US, gender: female)) ✅ Voice directives (@voice: name) ✅ Mark (@marker) ✅ Paragraph (\n\n) ✅ Phoneme ([text](ph: xsampa), [text](ipa: ipa)) ✅ Prosody shorthand (++loud++, >>fast>>, ^^high^^) ✅ Prosody explicit ([text](vrp: 555), [text](v: 5)) ✅ Substitution ([text](sub: alias)) ✅ Say-as ([text](as: telephone), [text](as: date, detail: 1)) ✅ Audio ([desc](url.mp3 alt), [desc](url.mp3 clip: 5s-30s, speed: 120%)) ✅ Headings (# ## ###) ✅ Extensions ([text](ext: whisper), Google TTS styles) ✅ Auto-sentence tags (<s>) ✅ SSML ↔ SSMD bidirectional conversion

Related Projects

SSMD (Ruby) - Original reference implementation
SSMD (JavaScript) - JavaScript implementation
Speech Markdown - Alternative specification

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT - see the LICENSE file for details.

Acknowledgments

Original SSMD specification by machisuji
JavaScript implementation by fabien88
X-SAMPA to IPA conversion table from the Ruby implementation

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.2

Jan 25, 2026

0.7.1

Jan 25, 2026

0.7.0

Jan 25, 2026

0.6.2

Jan 24, 2026

0.6.1

Jan 19, 2026

0.6.0

Jan 17, 2026

0.5.3

Jan 14, 2026

0.5.2

Jan 13, 2026

0.5.1

Jan 13, 2026

0.5.0

Jan 12, 2026

0.4.1

Jan 11, 2026

This version

0.4.0

Jan 11, 2026

0.3.1

Jan 11, 2026

0.3.0

Jan 10, 2026

0.2.1

Jan 10, 2026

0.2.0

Jan 10, 2026

0.1.0

Jan 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssmd-0.4.0.tar.gz (131.0 kB view details)

Uploaded Jan 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ssmd-0.4.0-py3-none-any.whl (64.7 kB view details)

Uploaded Jan 11, 2026 Python 3

File details

Details for the file ssmd-0.4.0.tar.gz.

File metadata

Download URL: ssmd-0.4.0.tar.gz
Upload date: Jan 11, 2026
Size: 131.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ssmd-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`617e4d646293f432ccc76a30e83b80fab92e46368b428a03efdd55b24dc51b80`
MD5	`d42730f99c020a93c9c674af6646b1ca`
BLAKE2b-256	`0030239130832e8840eaa286d44d7e13e60fd9e1a7f886c7793d054b66555cf1`

See more details on using hashes here.

File details

Details for the file ssmd-0.4.0-py3-none-any.whl.

File metadata

Download URL: ssmd-0.4.0-py3-none-any.whl
Upload date: Jan 11, 2026
Size: 64.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for ssmd-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a432c9c98f628528456816ec07dca752a45fae8ec2d4584bf02dac132492e5f`
MD5	`ad282517bd42b72c17c1a0d7210fa803`
BLAKE2b-256	`83186930d0816a2e5d13de210a0c1cdee3b3007d4dacffec8154b6c8debde5d8`

See more details on using hashes here.

ssmd 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SSMD - Speech Synthesis Markdown

Features

Installation

Optional: Enhanced Accuracy with spaCy

Quick Start

Basic Usage

Document API - Build TTS Content Incrementally

TTS Streaming Integration

Document Editing

TTS Engine Capabilities

Using Presets

Custom Capabilities

Capability-Aware Streaming

SSMD Syntax Reference

Text & Emphasis

Breaks & Pauses

Paragraphs

Language

Voice Selection

Inline Voice Annotations

Voice Directives (Block Syntax)

Phonetic Pronunciation

Prosody (Volume, Rate, Pitch)

Shorthand Notation

Explicit Notation

Substitution (Aliases)

Say-As

Audio Files

Markers

Headings

Extensions (Platform-Specific)

Google Cloud TTS Speaking Styles

Parser API - Extract Structured Data

When to Use the Parser

Quick Example

Parser Functions

parse_sentences(text, **options) → list[SSMDSentence]

parse_segments(text, **options) → list[SSMDSegment]

parse_voice_blocks(text) → list[tuple[VoiceAttrs | None, str]]

Data Structures

SSMDSentence

SSMDSegment

VoiceAttrs

ProsodyAttrs

BreakAttrs

SayAsAttrs

AudioAttrs

Complete Example

API Reference

Module Functions

ssmd.to_ssml(ssmd_text, **config) → str

ssmd.to_text(ssmd_text, **config) → str

ssmd.from_ssml(ssml_text, **config) → str

Document Class

Document(content="", config=None, capabilities=None)

Real-World TTS Example

Development

Running Tests

Code Quality

Specification

Implemented Features

Related Projects

Contributing

License

Acknowledgments

Links

Project details

Verified details

Maintainers

Unverified details

Project links

`parse_sentences(text, **options)` → `list[SSMDSentence]`

`parse_segments(text, **options)` → `list[SSMDSegment]`

`parse_voice_blocks(text)` → `list[tuple[VoiceAttrs | None, str]]`

`SSMDSentence`

`SSMDSegment`

`VoiceAttrs`

`ProsodyAttrs`

`BreakAttrs`

`SayAsAttrs`

`AudioAttrs`

`ssmd.to_ssml(ssmd_text, **config)` → `str`

`ssmd.to_text(ssmd_text, **config)` → `str`

`ssmd.from_ssml(ssml_text, **config)` → `str`

`Document(content="", config=None, capabilities=None)`