Skip to main content

Python SDK for Soniox speech-to-text API

Project description

Soniox Python SDK

Python Version License

Official Python SDK for Soniox Speech-to-Text API. Built with httpx for both synchronous and asynchronous support.

Features

  • 🎯 Complete API Coverage: Full support for Soniox REST API
  • Async & Sync: Full support for both synchronous and asynchronous operations
  • 🔒 Type Safe: Built with Pydantic v2 for robust type checking and validation
  • 📝 Comprehensive Logging: Built-in logging with the soniox logger
  • 🌍 60+ Languages: Transcribe speech in multiple languages with language hints
  • 🎭 Speaker Diarization: Identify different speakers in audio
  • 🔍 Language Identification: Automatic language detection
  • 📊 Word-Level Timestamps: Get precise timing for each word
  • 🎯 Context Support: Improve accuracy with domain-specific context

Installation

pip install soniox

Quick Start

Authentication

Set your API key as an environment variable:

export SONIOX_API_KEY="your-api-key-here"

Or pass it directly when initializing the client:

from soniox import SonioxClient

client = SonioxClient(api_key="your-api-key-here")

Basic Usage

Transcribe an Audio File

Synchronous:

import time
from soniox import SonioxClient

client = SonioxClient()

# Submit transcription job
job = client.transcribe_file("path/to/audio.wav")
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")

# Poll for completion
while True:
    job = client.get_transcription_job(job.id)
    if job.status == "completed":
        break
    time.sleep(1)

# Get the transcript
result = client.get_transcription_result(job.id)
print(f"Transcript: {result.text}")
print(f"Tokens: {len(result.tokens)}")

Asynchronous:

import asyncio
from soniox import SonioxClient

async def transcribe():
    client = SonioxClient()
    
    # Submit transcription job
    job = await client.transcribe_file_async("path/to/audio.wav")
    print(f"Job ID: {job.id}")
    
    # Poll for completion
    while True:
        job = await client.get_transcription_job_async(job.id)
        if job.status == "completed":
            break
        await asyncio.sleep(1)
    
    # Get the transcript
    result = await client.get_transcription_result_async(job.id)
    print(f"Transcript: {result.text}")

asyncio.run(transcribe())

Transcribe with Custom Configuration

You can pass configuration options either as a TranscriptionConfig object or as keyword arguments:

from soniox import SonioxClient
from soniox.languages import Language
from soniox.types import TranscriptionConfig

client = SonioxClient()

# Using TranscriptionConfig
config = TranscriptionConfig(
    model="stt-async-preview",
    language_hints=[Language.en],
    enable_speaker_diarization=True,
    context="Medical terminology context"
)
job = client.transcribe_file("audio.wav", config=config)

# Or using kwargs
job = client.transcribe_file(
    "audio.wav",
    model="stt-async-preview",
    enable_speaker_diarization=True
)

Advanced Features

Speaker Diarization

Identify different speakers in your audio:

import time
from soniox import SonioxClient

client = SonioxClient()

# Submit job with speaker diarization
job = client.transcribe_file(
    "path/to/audio.wav",
    enable_speaker_diarization=True
)

# Wait for completion
while True:
    job = client.get_transcription_job(job.id)
    if job.status == "completed":
        break
    time.sleep(1)

# Get results with speaker information
result = client.get_transcription_result(job.id)
for token in result.tokens:
    if token.speaker:
        print(f"Speaker {token.speaker}: {token.text}")

Language Identification

Automatically identify the language being spoken:

from soniox import SonioxClient
from soniox.languages import Language

client = SonioxClient()

job = client.transcribe_file(
    "multilingual_audio.wav",
    language_hints=[Language.en, Language.es, Language.fr],
    enable_language_identification=True
)

Context for Improved Accuracy

Provide context to improve recognition of domain-specific terms:

from soniox import SonioxClient

client = SonioxClient()

job = client.transcribe_file(
    "medical_audio.wav",
    context="Medical terminology: hypertension, cardiovascular, stethoscope"
)

Configuration

Client Options

from soniox import SonioxClient

client = SonioxClient(
    api_key="your-api-key",           # API key (or use SONIOX_API_KEY env var)
    base_url="https://api.soniox.com", # Custom base URL (optional)
    timeout=60.0                       # Request timeout in seconds
)

Logging

The SDK uses Python's standard logging module with the logger name soniox:

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("soniox")
logger.setLevel(logging.DEBUG)

# Or configure it your way
import logging

handler = logging.StreamHandler()
handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

logger = logging.getLogger("soniox")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

API Reference

SonioxClient

Main client for interacting with Soniox API.

Methods

transcribe_file(file_path, config=None, **kwargs)TranscriptionJob

Submit an audio file for transcription.

Parameters:

  • file_path (str): Path to audio file
  • config (TranscriptionConfig, optional): Configuration object
  • **kwargs: Configuration options (used if config is None)
    • model (str): Model to use (default: "stt-async-preview")
    • language_hints (list[Language]): Language hints for better accuracy
    • enable_speaker_diarization (bool): Enable speaker diarization
    • enable_language_identification (bool): Enable language identification
    • context (str): Context for improved accuracy
    • webhook_url (str): Webhook URL for completion notification
    • client_reference_id (str): Your reference ID

Returns: TranscriptionJob - Job object with status information

Raises:

  • FileNotFoundError: If file doesn't exist
  • SonioxAPIError: If API returns an error
get_transcription_job(job_id)TranscriptionJob

Get the status of a transcription job.

Parameters:

  • job_id (str): Job ID from transcribe_file()

Returns: TranscriptionJob - Updated job status

get_transcription_result(job_id)TranscriptionResult

Get the transcript once the job is completed.

Parameters:

  • job_id (str): Job ID from completed transcription

Returns: TranscriptionResult - Transcript with tokens

Raises:

  • SonioxAPIError: If job is not completed or not found
transcribe_file_async(file_path, config=None, **kwargs)TranscriptionJob

Async version of transcribe_file().

get_transcription_job_async(job_id)TranscriptionJob

Async version of get_transcription_job().

get_transcription_result_async(job_id)TranscriptionResult

Async version of get_transcription_result().

Models

TranscriptionJob

Transcription job status and metadata.

Fields:

  • id (str): Job ID (UUID)
  • status (TranscriptionJobStatus): Job status ("queued", "processing", "completed", "error")
  • created_at (datetime): Job creation timestamp
  • filename (str): Original filename
  • file_id (str | None): Uploaded file ID
  • audio_url (str | None): Audio URL if provided
  • audio_duration_ms (int | None): Audio duration in milliseconds
  • error_message (str | None): Error message if failed
  • All configuration fields from TranscriptionConfig

TranscriptionResult

Transcription result with full transcript.

Fields:

  • id (str): Transcript ID (matches job ID)
  • text (str): Full transcribed text
  • tokens (list[Token]): Word-level tokens with timing

Token

Word-level transcription token.

Fields:

  • text (str): Token text
  • start_ms (int): Start time in milliseconds
  • end_ms (int): End time in milliseconds
  • confidence (float): Confidence score (0-1)
  • speaker (str | None): Speaker ID if diarization enabled

TranscriptionConfig

Configuration for transcription jobs.

Fields:

  • model (str): Model to use (default: "stt-async-preview")
  • language_hints (list[Language] | None): Language hints
  • enable_language_identification (bool): Enable language detection
  • enable_speaker_diarization (bool): Enable speaker diarization
  • context (str | None): Context for improved accuracy
  • client_reference_id (str | None): Your reference ID
  • webhook_url (str | None): Webhook URL
  • webhook_auth_header_name (str | None): Webhook auth header name
  • webhook_auth_header_value (str | None): Webhook auth header value

FileUploadResponse

Response from file upload.

Fields:

  • id (str): File ID
  • filename (str): Original filename
  • size (int): File size in bytes
  • created_at (datetime): Upload timestamp
  • client_reference_id (str | None): Your reference ID

Exceptions

  • SonioxError: Base exception for all Soniox errors
  • SonioxAuthenticationError: Raised when authentication fails
  • SonioxAPIError: Raised when API returns an error response
  • SonioxRateLimitError: Raised when rate limit is exceeded

Error Handling

import time
from soniox import SonioxClient
from soniox.exceptions import (
    SonioxAPIError,
    SonioxAuthenticationError,
    SonioxRateLimitError,
)

client = SonioxClient()

try:
    # Submit transcription
    job = client.transcribe_file("audio.wav")
    
    # Wait for completion
    while True:
        job = client.get_transcription_job(job.id)
        if job.status == "completed":
            break
        elif job.status == "error":
            print(f"Transcription failed: {job.error_message}")
            break
        time.sleep(1)
    
    # Get result
    if job.status == "completed":
        result = client.get_transcription_result(job.id)
        print(result.text)

except FileNotFoundError:
    print("Audio file not found")
except SonioxAuthenticationError as e:
    print(f"Authentication failed: {e}")
except SonioxRateLimitError as e:
    print(f"Rate limit exceeded: {e}")
    print(f"Status code: {e.status_code}")
except SonioxAPIError as e:
    print(f"API error: {e}")
    print(f"Status code: {e.status_code}")
    print(f"Response: {e.response_body}")

Testing

Run tests with pytest:

# Install development dependencies
pip install -e ".[dev,test]"

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html --cov-report=term-missing

# Run specific test file
pytest tests/test_models.py

# Run with verbose output
pytest -v

See tests/README.md for more details on the test suite.

Development

# Clone the repository
git clone https://github.com/mahdikiani/soniox-sdk.git
cd soniox

# Install in editable mode with dev dependencies
pip install -e ".[dev,test]"

# Run linter
ruff check src/

# Run type checker
mypy src/

Resources

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Support

Changelog

See CHANGELOG.md for version history and updates.


Made with ❤️ by Mahdi Kiani

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soniox_sdk-0.1.6.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soniox_sdk-0.1.6-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file soniox_sdk-0.1.6.tar.gz.

File metadata

  • Download URL: soniox_sdk-0.1.6.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soniox_sdk-0.1.6.tar.gz
Algorithm Hash digest
SHA256 85770cc0c154a6e5259de2b8354486ff25c09593f5a62a5ce2b7ae401c75bfa2
MD5 c8a4ba753117cf5fab41cf3ea86bbc73
BLAKE2b-256 069def7f451df1d3f3a71963a5dbf17eecb9bd7104a9b8184f4b0a92e79e7d24

See more details on using hashes here.

File details

Details for the file soniox_sdk-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: soniox_sdk-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soniox_sdk-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6b2913735eb6489b6b23813f07a86a4438bbba4fe0ad9a70209be73fc7afb5aa
MD5 7110eb3d39c6a98e54d2281dfb37c56b
BLAKE2b-256 794e77bf69889070c322119c7aa9605eaf5296aeefabd01ace4c62c923eef723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page