Skip to main content

Sber Salute Speech API

Project description

Sber Salute Speech Python API

A Python client for Sber's Salute Speech Recognition service with a simple, async-first API.

Features

  • OpenAI Whisper-like API for ease of use
  • Asynchronous API for compatibility and better performance
  • Comprehensive error handling
  • Support for multiple audio formats
  • Command-line interface for quick transcription

Installation

pip install salute_speech

Quick Start

from salute_speech.speech_recognition import SaluteSpeechClient
import asyncio
import os

async def main():
    # Initialize the client (from environment variable)
    client = SaluteSpeechClient(client_credentials=os.getenv("SBER_SPEECH_API_KEY"))
    
    # Open and transcribe an audio file
    with open("audio.mp3", "rb") as audio_file:
        result = await client.audio.transcriptions.create(
            file=audio_file,
            language="ru-RU"
        )
        print(result.text)

# Run the async function
asyncio.run(main())

API Reference

SaluteSpeechClient

The main client class that provides access to the Sber Speech API.

client = SaluteSpeechClient(client_credentials="your_credentials_here")

client.audio.transcriptions.create()

Creates a transcription for the given audio file.

Parameters:

  • file (BinaryIO): An audio file opened in binary mode
  • language (str, optional): Language code for transcription. Defaults to "ru-RU"
  • prompt (str, optional): Optional prompt to guide transcription
  • response_format (str, optional): Format of the response. Currently only "text" is supported
  • poll_interval (float, optional): Interval between status checks in seconds. Defaults to 1.0

Returns:

  • TranscriptionResponse object with:
    • text: The transcribed text
    • status: Status of the transcription job
    • task_id: ID of the transcription task

Example:

async with open("meeting.mp3", "rb") as audio_file:
    result = await client.audio.transcriptions.create(
        file=audio_file,
        language="ru-RU"
    )
    print(result.text)

Supported Audio Formats

The service supports the following audio formats:

Format Max Channels Sample Rate Range
PCM_S16LE (WAV) 8 8,000 - 96,000 Hz
OPUS 1 Any
MP3 2 Any
FLAC 8 Any
ALAW 8 8,000 - 96,000 Hz
MULAW 8 8,000 - 96,000 Hz

Audio parameters are automatically detected and validated using the AudioValidator class.

Error Handling

The client provides structured error handling with specific exception classes:

try:
    result = await client.audio.transcriptions.create(file=audio_file)
except TokenRequestError as e:
    print(f"Authentication error: {e}")
except FileUploadError as e:
    print(f"Upload failed: {e}")
except TaskStatusResponseError as e:
    print(f"Transcription task failed: {e}")
except ValidationError as e:
    print(f"Audio validation failed: {e}")
except SberSpeechError as e:
    print(f"General API error: {e}")

Token Management

Authentication tokens are automatically managed by the TokenManager class, which:

  • Caches tokens to minimize API requests
  • Refreshes tokens when they expire
  • Validates token format and expiration

Command Line Interface

The package includes a command-line interface for quick transcription tasks:

# Set your API key as an environment variable
export SBER_SPEECH_API_KEY=your_key_here

Basic Usage:

salute_speech --help

Transcribe to text:

# Prepare audio (recommended: convert to mono)
ffmpeg -i video.mp4 -ac 1 -ar 16000 audio.wav

# Transcribe to text
salute_speech transcribe-audio audio.wav -o transcript.txt

Transcribe to WebVTT:

salute_speech transcribe-audio audio.wav -o transcript.vtt

Supported output formats:

  • txt - Plain text
  • vtt - WebVTT subtitles
  • srt - SubRip subtitles
  • tsv - Tab-separated values
  • json - JSON format with detailed information

Note: Each audio channel is transcribed separately, so converting to mono is recommended for most cases.

Advanced Configuration

For advanced use cases, you can customize the speech recognition parameters:

from salute_speech.speech_recognition import SpeechRecognitionConfig

config = SpeechRecognitionConfig(
    hypotheses_count=3,              # Number of transcription variants
    enable_profanity_filter=True,    # Filter out profanity
    max_speech_timeout="30s",        # Maximum timeout for speech segments
    speaker_separation=True          # Enable speaker separation
)

result = await client.audio.transcriptions.create(
    file=audio_file,
    language="ru-RU",
    config=config
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

salute_speech-1.3.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

salute_speech-1.3.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file salute_speech-1.3.0.tar.gz.

File metadata

  • Download URL: salute_speech-1.3.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for salute_speech-1.3.0.tar.gz
Algorithm Hash digest
SHA256 aa6b7bbb4d268ac8a10318b7b275088c27b9c7cf83ccb47f108ac5efa534b0bc
MD5 4439ef381c009f036b3a7834d08ff4be
BLAKE2b-256 58477b1ef27d3097274c627709c217fdf5dad06b09c305f3fa743fb748927d6f

See more details on using hashes here.

File details

Details for the file salute_speech-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: salute_speech-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for salute_speech-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e241c66925a217b8fe0d277f446074fa97d63f5067f8b0c331813e0a4723dc93
MD5 44465d030647171a14fa3d4b49016fd2
BLAKE2b-256 9c0e7bf80ef6ae7c9d292344d8e116788aa6378f4b5a8537573a0d60adb4501b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page