Skip to main content

Python SDK for Speechall API - Speech-to-text transcription service

Project description

Speechall Python SDK

Python SDK for the Speechall API - A powerful speech-to-text transcription service supporting multiple AI models and providers.

PyPI version Python 3.8+

Features

  • Multiple AI Models: Access various speech-to-text models from different providers (OpenAI Whisper, and more)
  • Flexible Input: Transcribe local audio files or remote URLs
  • Rich Output Formats: Get results in text, JSON, SRT, or VTT formats
  • Speaker Diarization: Identify and separate different speakers in audio
  • Custom Vocabulary: Improve accuracy with domain-specific terms
  • Replacement Rules: Apply custom text transformations to transcriptions
  • Language Support: Auto-detect languages or specify from a wide range of supported languages
  • Async Support: Built with async/await support using httpx

Installation

pip install speechall

Quick Start

Basic Transcription

import os
from speechall import SpeechallApi

# Initialize the client
client = SpeechallApi(token=os.getenv("SPEECHALL_API_TOKEN"))

# Transcribe a local audio file
with open("audio.mp3", "rb") as audio_file:
    audio_data = audio_file.read()

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    punctuation=True
)

print(response.text)

Transcribe Remote Audio

from speechall import SpeechallApi

client = SpeechallApi(token=os.getenv("SPEECHALL_API_TOKEN"))

response = client.speech_to_text.transcribe_remote(
    file_url="https://example.com/audio.mp3",
    model="openai.whisper-1",
    language="auto",  # Auto-detect language
    output_format="json"
)

print(response.text)

Advanced Features

Speaker Diarization

Identify different speakers in your audio:

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    diarization=True,
    speakers_expected=2
)

for segment in response.segments:
    print(f"[Speaker {segment.speaker}] {segment.text}")

Custom Vocabulary

Improve accuracy for specific terms:

response = client.speech_to_text.transcribe(
    model="openai.whisper-1",
    request=audio_data,
    language="en",
    output_format="json",
    custom_vocabulary=["Kubernetes", "API", "Docker", "microservices"]
)

Replacement Rules

Apply custom text transformations:

from speechall import ReplacementRule, ExactRule

replacement_rules = [
    ReplacementRule(
        rule=ExactRule(find="API", replace="Application Programming Interface")
    )
]

response = client.speech_to_text.transcribe_remote(
    file_url="https://example.com/audio.mp3",
    model="openai.whisper-1",
    language="en",
    output_format="json",
    replacement_ruleset=replacement_rules
)

List Available Models

models = client.speech_to_text.list_speech_to_text_models()

for model in models:
    print(f"{model.model_identifier}: {model.display_name}")
    print(f"  Provider: {model.provider}")

Configuration

Authentication

Get your API token from speechall.com and set it as an environment variable:

export SPEECHALL_API_TOKEN="your-token-here"

Or pass it directly when initializing the client:

from speechall import SpeechallApi

client = SpeechallApi(token="your-token-here")

Output Formats

  • text: Plain text transcription
  • json: JSON with detailed information (segments, timestamps, metadata)
  • json_text: JSON with simplified text output
  • srt: SubRip subtitle format
  • vtt: WebVTT subtitle format

Language Codes

Use ISO 639-1 language codes (e.g., en, es, fr, de) or auto for automatic detection.

API Reference

Client Classes

  • SpeechallApi: Main client for the Speechall API
  • AsyncSpeechallApi: Async client for the Speechall API

Main Methods

speech_to_text.transcribe()

Transcribe a local audio file.

Parameters:

  • model (str): Model identifier (e.g., "openai.whisper-1")
  • request (bytes): Audio file content
  • language (str): Language code or "auto"
  • output_format (str): Output format (text, json, srt, vtt)
  • punctuation (bool): Enable automatic punctuation
  • diarization (bool): Enable speaker identification
  • speakers_expected (int, optional): Expected number of speakers
  • custom_vocabulary (list, optional): List of custom terms
  • initial_prompt (str, optional): Context prompt for the model
  • temperature (float, optional): Model temperature (0.0-1.0)

speech_to_text.transcribe_remote()

Transcribe audio from a URL.

Parameters: Same as transcribe() but with file_url instead of request

speech_to_text.list_speech_to_text_models()

List all available models.

Examples

Check out the examples directory for more detailed usage examples:

Requirements

  • Python 3.8+
  • httpx >= 0.27.0
  • pydantic >= 2.0.0
  • typing-extensions >= 4.0.0

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy .

Support

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechall-0.3.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechall-0.3.0-py3-none-any.whl (56.4 kB view details)

Uploaded Python 3

File details

Details for the file speechall-0.3.0.tar.gz.

File metadata

  • Download URL: speechall-0.3.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechall-0.3.0.tar.gz
Algorithm Hash digest
SHA256 67ebd024c26fa6a7bbcc0fd07437be65461e35c79e658cf8b783847f8dcd28e3
MD5 1686884f1ae7ce580fc5bd294407108c
BLAKE2b-256 36a7fb738d44fc7bea86011b0687e35e0d175f7d0f8c8129d4a7301d132b92ac

See more details on using hashes here.

File details

Details for the file speechall-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: speechall-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 56.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechall-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1f46f3c221ea3b316365d04885992458c263ea213183675047e5fe13d80da23
MD5 0492469359202221065423cf75d3b2fc
BLAKE2b-256 0a0c59cde247e7c379829c149f36104efbe7e89e86eb820d473248d33df0a13e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page