Skip to main content

GL Speech Python Client - Language binding SDK for Prosa Speech API

Project description

GL Speech SDK

A Python SDK for interacting with the GL Speech API, providing speech-to-text (STT), text-to-speech (TTS), and webhook management capabilities.

Prerequisites

  • Python ≥3.11,<3.13 (3.11 or 3.12)

Installation

pip install gl-speech

Or with uv:

uv add gl-speech

Quick Start

STT and TTS use different API keys and base URLs. Create two clients. Webhooks for STT jobs use the STT client; webhooks for TTS jobs use the TTS client.

from gl_speech_sdk import SpeechClient

stt_client = SpeechClient(api_key="your-stt-api-key", base_url="https://api.prosa.ai/v2/speech/")
tts_client = SpeechClient(api_key="your-tts-api-key", base_url="https://api.prosa.ai/v2/speech/")

# Speech-to-Text
result = stt_client.stt.transcribe(
    data="<base64-encoded-audio>",
    model="stt-general",
    wait=True
)
print(result.result)

# Text-to-Speech
result = tts_client.tts.synthesize(
    text="Hello, world!",
    model="tts-dimas-formal",
    wait=True
)
print(result.result)

# Webhooks: STT job events use STT client, TTS job events use TTS client
stt_endpoint = stt_client.webhooks.create_endpoint(
    url="https://your-server.com/webhook-stt",
    event_filters=["stt.job.completed"]
)
tts_endpoint = tts_client.webhooks.create_endpoint(
    url="https://your-server.com/webhook-tts",
    event_filters=["tts.job.completed"]
)

Configuration

Environment Variables

Set separate credentials for STT and TTS:

  • GLSPEECH_STT_API_KEY: API key for Speech-to-Text
  • GLSPEECH_STT_BASE_URL: Base URL for STT (default: https://api.prosa.ai/v2/speech/)
  • GLSPEECH_TTS_API_KEY: API key for Text-to-Speech
  • GLSPEECH_TTS_BASE_URL: Base URL for TTS (default: https://api.prosa.ai/v2/speech/)

Webhook management is per job type: use the STT client for STT job webhooks, the TTS client for TTS job webhooks.

Client Initialization

from gl_speech_sdk import SpeechClient

# Two clients (STT and TTS have different keys)
stt_client = SpeechClient(
    api_key="your-stt-api-key",
    base_url="https://api.prosa.ai/v2/speech/",
    timeout=60.0,
    default_headers={"X-Custom-Header": "value"}
)
tts_client = SpeechClient(
    api_key="your-tts-api-key",
    base_url="https://api.prosa.ai/v2/speech/",
    timeout=60.0,
    default_headers={"X-Custom-Header": "value"}
)

# Or from environment variables
import os
os.environ["GLSPEECH_STT_API_KEY"] = "your-stt-api-key"
os.environ["GLSPEECH_STT_BASE_URL"] = "https://api.prosa.ai/v2/speech/"
os.environ["GLSPEECH_TTS_API_KEY"] = "your-tts-api-key"
os.environ["GLSPEECH_TTS_BASE_URL"] = "https://api.prosa.ai/v2/speech/"
stt_client = SpeechClient(api_key=os.getenv("GLSPEECH_STT_API_KEY"), base_url=os.getenv("GLSPEECH_STT_BASE_URL"))
tts_client = SpeechClient(api_key=os.getenv("GLSPEECH_TTS_API_KEY"), base_url=os.getenv("GLSPEECH_TTS_BASE_URL"))

Speech-to-Text (STT)

Use stt_client for all STT operations.

List Available Models

models = stt_client.stt.list_models()
for model in models:
    print(f"{model['name']}: {model['label']}")

Transcribe Audio

# Synchronous (wait for result)
result = stt_client.stt.transcribe(
    model="stt-general",
    wait=True,
    data="<base64-encoded-audio>",
    label="My audio file"
)
print(result.result)

# Asynchronous (get job_id, poll later)
result = stt_client.stt.transcribe(
    model="stt-general",
    wait=False,
    uri="https://example.com/audio.wav"
)
job_id = result.job_id

# Check status
status = stt_client.stt.get_status(job_id)
print(f"Status: {status.status}, Progress: {status.progress}")

# Get result when complete
result = stt_client.stt.get_job(job_id)
print(result.result)

Advanced Configuration

result = stt_client.stt.transcribe(
    model="stt-general",
    wait=True,
    data="<base64-encoded-audio>",
    speaker_count=2,           # Expected number of speakers
    include_filler=True,       # Include filler words
    auto_punctuation=True,     # Auto-add punctuation
    enable_spoken_numerals=True,  # Convert "one" to "1"
    enable_speech_insights=True,  # Speech analytics
    enable_voice_insights=True,   # Voice analytics
)

List and Manage Jobs

# List jobs with filters
jobs = stt_client.stt.list_jobs(
    page=1,
    per_page=10,
    from_date="2024-01-01",
    until_date="2024-01-31",
    query_text="hello"
)

# Archive a job
stt_client.stt.archive(job_id)

Text-to-Speech (TTS)

Use tts_client for all TTS operations.

List Available Models

models = tts_client.tts.list_models()
for model in models:
    print(f"{model['name']}: {model['voice']} ({model['gender']})")

Synthesize Speech

# Synchronous (wait for result)
result = tts_client.tts.synthesize(
    text="Hello, world!",
    model="tts-dimas-formal",
    wait=True
)
audio_data = result.result["data"]  # Base64-encoded audio

# Get as signed URL instead
result = tts_client.tts.synthesize(
    text="Hello, world!",
    model="tts-dimas-formal",
    wait=True,
    as_signed_url=True
)
audio_url = result.result["path"]

# Asynchronous
result = tts_client.tts.synthesize(
    text="Long text content...",
    model="tts-dimas-formal",
    wait=False
)
job_id = result.job_id

# Poll for completion
result = tts_client.tts.get_job(job_id, as_signed_url=True)

Advanced Configuration

result = tts_client.tts.synthesize(
    text="Hello, world!",
    model="tts-dimas-formal",
    wait=True,
    pitch=0.5,           # Pitch adjustment (-1.0 to 1.0)
    tempo=1.2,           # Speed adjustment (0.5 to 2.0)
    audio_format="mp3",  # "opus", "mp3", or "wav"
    label="My synthesis"
)

List and Manage Jobs

# List jobs
jobs = tts_client.tts.list_jobs(page=1, per_page=10)

# Count jobs
count = tts_client.tts.count_jobs(from_date="2024-01-01")

# Get job status
status = tts_client.tts.get_status(job_id)

# Archive a job
tts_client.tts.archive(job_id)

Webhook Management

Webhooks and API keys

Webhook management is split by job type:

  • STT job webhooks (e.g. stt.job.completed) use the STT API key and base URL → use stt_client.webhooks.
  • TTS job webhooks (e.g. tts.job.completed) use the TTS API key and base URL → use tts_client.webhooks.

Create and manage endpoints on the client that matches the events you want to receive.

Create a Webhook Endpoint

# Endpoint for STT job events (uses STT key)
stt_endpoint = stt_client.webhooks.create_endpoint(
    url="https://your-server.com/webhook-stt",
    event_filters=["stt.job.completed"],
    ssl_verification=True
)

# Endpoint for TTS job events (uses TTS key)
tts_endpoint = tts_client.webhooks.create_endpoint(
    url="https://your-server.com/webhook-tts",
    event_filters=["tts.job.completed"],
    ssl_verification=True
)
print(f"STT Endpoint ID: {stt_endpoint.id}, Secret: {stt_endpoint.secrets[0].key}")
print(f"TTS Endpoint ID: {tts_endpoint.id}, Secret: {tts_endpoint.secrets[0].key}")

List Endpoints

# List STT webhook endpoints
stt_endpoints = stt_client.webhooks.list_endpoints()

# List TTS webhook endpoints
tts_endpoints = tts_client.webhooks.list_endpoints()
for ep in tts_endpoints:
    print(f"{ep.id}: {ep.url}")

Update and Delete Endpoints

# Update/delete on the same client you used to create (STT or TTS)
stt_client.webhooks.update_endpoint(
    endpoint_id="endpoint-123",
    url="https://your-server.com/new-webhook",
    event_filters=[]
)
stt_client.webhooks.delete_endpoint("endpoint-123")

Rotate Secrets

tts_client.webhooks.rotate_secret(
    endpoint_id="endpoint-123",
    days=3,  # Old secret valid for 3 days
    hours=0
)

Event Management

# List events (from the client whose webhooks you're querying)
events = tts_client.webhooks.list_events(
    from_date="2024-01-01",
    to_date="2024-01-31"
)
event = tts_client.webhooks.get_event("event-123")
print(event.data)

Delivery Management

deliveries = tts_client.webhooks.list_deliveries("endpoint-123")
ticket = tts_client.webhooks.replay_delivery("delivery-123")
tickets = tts_client.webhooks.replay_failed_deliveries("endpoint-123")
ticket = tts_client.webhooks.test_endpoint("endpoint-123")

Error Handling

import httpx
from gl_speech_sdk import SpeechClient

stt_client = SpeechClient(api_key="your-stt-api-key", base_url="https://api.prosa.ai/v2/speech/")

try:
    result = stt_client.stt.transcribe(model="stt-general", data="invalid")
except httpx.HTTPStatusError as e:
    print(f"HTTP Error: {e.response.status_code}")
    print(f"Response: {e.response.text}")
except ValueError as e:
    print(f"Validation Error: {e}")

API Reference

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gl_speech-0.0.1b9.tar.gz (13.5 kB view details)

Uploaded Source

File details

Details for the file gl_speech-0.0.1b9.tar.gz.

File metadata

  • Download URL: gl_speech-0.0.1b9.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.24

File hashes

Hashes for gl_speech-0.0.1b9.tar.gz
Algorithm Hash digest
SHA256 4d248779c1a58fe8a538b6c1a37538dae7b04ae2d0d00f85bfb4879021921193
MD5 83116a30bf8152be7f66c18b74645ce3
BLAKE2b-256 b1b782bc9368946151febf4f8ae32469af1faa4c30ce9eb2abb1ee0bb541b3cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page