Sber Salute Speech API
Project description
Sber Salute Speech Python API
A Python client for Sber's Salute Speech Recognition service with a simple, async-first API.
Features
- OpenAI Whisper-like API for ease of use
- Asynchronous API for compatibility and better performance
- Comprehensive error handling
- Support for multiple audio formats
- Command-line interface for quick transcription
Installation
pip install salute_speech
Quick Start
from salute_speech.speech_recognition import SaluteSpeechClient
import asyncio
import os
async def main():
# Initialize the client (from environment variable)
client = SaluteSpeechClient(client_credentials=os.getenv("SBER_SPEECH_API_KEY"))
# Open and transcribe an audio file
with open("audio.mp3", "rb") as audio_file:
result = await client.audio.transcriptions.create(
file=audio_file,
language="ru-RU"
)
print(result.text)
# Run the async function
asyncio.run(main())
API Reference
SaluteSpeechClient
The main client class that provides access to the Sber Speech API.
client = SaluteSpeechClient(client_credentials="your_credentials_here")
client.audio.transcriptions.create()
Creates a transcription for the given audio file.
Parameters:
file
(BinaryIO): An audio file opened in binary modelanguage
(str, optional): Language code for transcription. Defaults to "ru-RU"prompt
(str, optional): Optional prompt to guide transcriptionresponse_format
(str, optional): Format of the response. Currently only "text" is supportedpoll_interval
(float, optional): Interval between status checks in seconds. Defaults to 1.0
Returns:
TranscriptionResponse
object with:text
: The transcribed textstatus
: Status of the transcription jobtask_id
: ID of the transcription task
Example:
async with open("meeting.mp3", "rb") as audio_file:
result = await client.audio.transcriptions.create(
file=audio_file,
language="ru-RU"
)
print(result.text)
Supported Audio Formats
The service supports the following audio formats:
Format | Max Channels | Sample Rate Range |
---|---|---|
PCM_S16LE (WAV) | 8 | 8,000 - 96,000 Hz |
OPUS | 1 | Any |
MP3 | 2 | Any |
FLAC | 8 | Any |
ALAW | 8 | 8,000 - 96,000 Hz |
MULAW | 8 | 8,000 - 96,000 Hz |
Audio parameters are automatically detected and validated using the AudioValidator
class.
Error Handling
The client provides structured error handling with specific exception classes:
try:
result = await client.audio.transcriptions.create(file=audio_file)
except TokenRequestError as e:
print(f"Authentication error: {e}")
except FileUploadError as e:
print(f"Upload failed: {e}")
except TaskStatusResponseError as e:
print(f"Transcription task failed: {e}")
except ValidationError as e:
print(f"Audio validation failed: {e}")
except SberSpeechError as e:
print(f"General API error: {e}")
Token Management
Authentication tokens are automatically managed by the TokenManager
class, which:
- Caches tokens to minimize API requests
- Refreshes tokens when they expire
- Validates token format and expiration
Command Line Interface
The package includes a command-line interface for quick transcription tasks:
# Set your API key as an environment variable
export SBER_SPEECH_API_KEY=your_key_here
Basic Usage:
salute_speech --help
Transcribe to text:
# Prepare audio (recommended: convert to mono)
ffmpeg -i video.mp4 -ac 1 -ar 16000 audio.wav
# Transcribe to text
salute_speech transcribe-audio audio.wav -o transcript.txt
Transcribe to WebVTT:
salute_speech transcribe-audio audio.wav -o transcript.vtt
Supported output formats:
txt
- Plain textvtt
- WebVTT subtitlessrt
- SubRip subtitlestsv
- Tab-separated valuesjson
- JSON format with detailed information
Note: Each audio channel is transcribed separately, so converting to mono is recommended for most cases.
Advanced Configuration
For advanced use cases, you can customize the speech recognition parameters:
from salute_speech.speech_recognition import SpeechRecognitionConfig
config = SpeechRecognitionConfig(
hypotheses_count=3, # Number of transcription variants
enable_profanity_filter=True, # Filter out profanity
max_speech_timeout="30s", # Maximum timeout for speech segments
speaker_separation=True # Enable speaker separation
)
result = await client.audio.transcriptions.create(
file=audio_file,
language="ru-RU",
config=config
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file salute_speech-1.3.0.tar.gz
.
File metadata
- Download URL: salute_speech-1.3.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
aa6b7bbb4d268ac8a10318b7b275088c27b9c7cf83ccb47f108ac5efa534b0bc
|
|
MD5 |
4439ef381c009f036b3a7834d08ff4be
|
|
BLAKE2b-256 |
58477b1ef27d3097274c627709c217fdf5dad06b09c305f3fa743fb748927d6f
|
File details
Details for the file salute_speech-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: salute_speech-1.3.0-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e241c66925a217b8fe0d277f446074fa97d63f5067f8b0c331813e0a4723dc93
|
|
MD5 |
44465d030647171a14fa3d4b49016fd2
|
|
BLAKE2b-256 |
9c0e7bf80ef6ae7c9d292344d8e116788aa6378f4b5a8537573a0d60adb4501b
|