Sber Salute Speech API

These details have not been verified by PyPI

Project links

Project description

Sber Salute Speech Python API

A Python client for Sber's Salute Speech Recognition service with a simple, async-first API.

Features

OpenAI Whisper-like API for ease of use
Asynchronous API for compatibility and better performance
Comprehensive error handling
Support for multiple audio formats
Command-line interface for quick transcription

Installation

pip install salute_speech

Quick Start

from salute_speech.speech_recognition import SaluteSpeechClient
import asyncio
import os

async def main():
    # Initialize the client (from environment variable)
    client = SaluteSpeechClient(client_credentials=os.getenv("SBER_SPEECH_API_KEY"))
    
    # Open and transcribe an audio file
    with open("audio.mp3", "rb") as audio_file:
        result = await client.audio.transcriptions.create(
            file=audio_file,
            language="ru-RU"
        )
        print(result.text)

# Run the async function
asyncio.run(main())

API Reference

SaluteSpeechClient

The main client class that provides access to the Sber Speech API.

client = SaluteSpeechClient(client_credentials="your_credentials_here")

`client.audio.transcriptions.create()`

Creates a transcription for the given audio file.

Parameters:

file (BinaryIO): An audio file opened in binary mode
language (str, optional): Language code for transcription. Defaults to "ru-RU"
prompt (str, optional): Optional prompt to guide transcription
response_format (str, optional): Format of the response. Currently only "text" is supported
poll_interval (float, optional): Interval between status checks in seconds. Defaults to 1.0

Returns:

TranscriptionResponse object with:
- text: The transcribed text
- status: Status of the transcription job
- task_id: ID of the transcription task

Example:

async with open("meeting.mp3", "rb") as audio_file:
    result = await client.audio.transcriptions.create(
        file=audio_file,
        language="ru-RU"
    )
    print(result.text)

Supported Audio Formats

The service supports the following audio formats:

Format	Max Channels	Sample Rate Range
PCM_S16LE (WAV)	8	8,000 - 96,000 Hz
OPUS	1	Any
MP3	2	Any
FLAC	8	Any
ALAW	8	8,000 - 96,000 Hz
MULAW	8	8,000 - 96,000 Hz

Audio parameters are automatically detected and validated using the AudioValidator class.

Error Handling

The client provides structured error handling with specific exception classes:

try:
    result = await client.audio.transcriptions.create(file=audio_file)
except TokenRequestError as e:
    print(f"Authentication error: {e}")
except FileUploadError as e:
    print(f"Upload failed: {e}")
except TaskStatusResponseError as e:
    print(f"Transcription task failed: {e}")
except ValidationError as e:
    print(f"Audio validation failed: {e}")
except SberSpeechError as e:
    print(f"General API error: {e}")

Token Management

Authentication tokens are automatically managed by the TokenManager class, which:

Caches tokens to minimize API requests
Refreshes tokens when they expire
Validates token format and expiration

Command Line Interface

The package includes a command-line interface for quick transcription tasks:

# Set your API key as an environment variable
export SBER_SPEECH_API_KEY=your_key_here

Basic Usage:

salute_speech --help

Transcribe to text:

# Prepare audio (recommended: convert to mono)
ffmpeg -i video.mp4 -ac 1 -ar 16000 audio.wav

# Transcribe to text
salute_speech transcribe-audio audio.wav -o transcript.txt

Transcribe to WebVTT:

salute_speech transcribe-audio audio.wav -o transcript.vtt

Supported output formats:

txt - Plain text
vtt - WebVTT subtitles
srt - SubRip subtitles
tsv - Tab-separated values
json - JSON format with detailed information

Note: Each audio channel is transcribed separately, so converting to mono is recommended for most cases.

Advanced Configuration

For advanced use cases, you can customize the speech recognition parameters:

from salute_speech.speech_recognition import SpeechRecognitionConfig

config = SpeechRecognitionConfig(
    hypotheses_count=3,              # Number of transcription variants
    enable_profanity_filter=True,    # Filter out profanity
    max_speech_timeout="30s",        # Maximum timeout for speech segments
    speaker_separation=True          # Enable speaker separation
)

result = await client.audio.transcriptions.create(
    file=audio_file,
    language="ru-RU",
    config=config
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.0

Apr 2, 2025

1.2.4

Dec 1, 2024

1.2.3

Dec 1, 2024

1.2.2

Dec 1, 2024

1.2.1

Nov 30, 2024

1.1.1

Dec 5, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

salute_speech-1.3.0.tar.gz (29.0 kB view details)

Uploaded Apr 2, 2025 Source

Built Distribution

salute_speech-1.3.0-py3-none-any.whl (24.9 kB view details)

Uploaded Apr 2, 2025 Python 3

File details

Details for the file salute_speech-1.3.0.tar.gz.

File metadata

Download URL: salute_speech-1.3.0.tar.gz
Upload date: Apr 2, 2025
Size: 29.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for salute_speech-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`aa6b7bbb4d268ac8a10318b7b275088c27b9c7cf83ccb47f108ac5efa534b0bc`
MD5	`4439ef381c009f036b3a7834d08ff4be`
BLAKE2b-256	`58477b1ef27d3097274c627709c217fdf5dad06b09c305f3fa743fb748927d6f`

See more details on using hashes here.

File details

Details for the file salute_speech-1.3.0-py3-none-any.whl.

File metadata

Download URL: salute_speech-1.3.0-py3-none-any.whl
Upload date: Apr 2, 2025
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for salute_speech-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e241c66925a217b8fe0d277f446074fa97d63f5067f8b0c331813e0a4723dc93`
MD5	`44465d030647171a14fa3d4b49016fd2`
BLAKE2b-256	`9c0e7bf80ef6ae7c9d292344d8e116788aa6378f4b5a8537573a0d60adb4501b`

See more details on using hashes here.

salute-speech 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sber Salute Speech Python API

Features

Installation

Quick Start

API Reference

SaluteSpeechClient

`client.audio.transcriptions.create()`

Supported Audio Formats

Error Handling

Token Management

Command Line Interface

Advanced Configuration

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes