Skip to main content

LangChain integration for Soniox

Project description

Soniox LangChain Integration

Get started using the Soniox audio transcription loader in LangChain.

Setup

Install the package:

pip install langchain-soniox

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:

export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Transcribe audio files using the SonioxDocumentLoader:

from langchain_soniox import SonioxDocumentLoader

# Using a URL
loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)

docs = list(loader.lazy_load())
print(docs[0].page_content)  # Transcribed text

You can also load audio from a local file or from bytes:

# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")

# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
    audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)

Async transcription

For async operations, use alazy_load():

import asyncio
from langchain_soniox import SonioxDocumentLoader

async def transcribe_async():
    loader = SonioxDocumentLoader(
        file_url="https://soniox.com/media/examples/coffee_shop.mp3"
    )

    docs = [doc async for doc in loader.alazy_load()]
    print(docs[0].page_content)

asyncio.run(transcribe_async())

Advanced usage

Language hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.

Language hints do not restrict recognition — they only bias the model toward the specified languages, while still allowing other languages to be detected if present.

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        language_hints=["en", "es"],
    ),
)

docs = list(loader.lazy_load())

For more details, see the Soniox language hints documentation.

Speaker diarization

Enable speaker identification to distinguish between different speakers:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_speaker_diarization=True,
    ),
)

docs = list(loader.lazy_load())

# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_speaker != token["speaker"]:
        current_speaker = token["speaker"]
        output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

Language identification

Enable automatic language detection and identification:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_language_identification=True,
    ),
)

docs = list(loader.lazy_load())

# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_language != token["language"]:
        current_language = token["language"]
        output += f"\n[{current_language}] {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.

The context object supports four optional sections:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    StructuredContext,
    StructuredContextGeneralItem,
    StructuredContextTranslationTerm,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        context=StructuredContext(
            # Structured key-value information (domain, topic, intent, etc.)
            general=[
                StructuredContextGeneralItem(key="domain", value="Healthcare"),
                StructuredContextGeneralItem(
                    key="topic", value="Diabetes management consultation"
                ),
                StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
            ],
            # Longer free-form background text or related documents
            text="The patient has a history of...",
            # Domain-specific or uncommon words
            terms=["Celebrex", "Zyrtec", "Xanax"],
            # Custom translations for ambiguous terms
            translation_terms=[
                StructuredContextTranslationTerm(
                    source="Mr. Smith", target="Sr. Smith"
                ),
                StructuredContextTranslationTerm(source="MRI", target="RM"),
            ],
        ),
    ),
)

docs = list(loader.lazy_load())

For more details, see the Soniox context documentation.

Translation

Translate from any detected language to a target language:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    TranslationConfig,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        translation=TranslationConfig(
            type="one_way",
            target_language="fr",
        ),
        language_hints=["en"],
    ),
)

docs = list(loader.lazy_load())

for token in docs[0].metadata["tokens"]:
    if token["translation_status"] == "translation":
        translated_text += token["text"]
    else:
        original_text += token["text"]

print(original_text)
print(translated_text)

You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.

API reference

Constructor parameters

Parameter Type Required Default Description
file_path str No* None Path to local audio file to transcribe
file_data bytes No* None Binary data of audio file to transcribe
file_url str No* None URL of audio file to transcribe
api_key str No SONIOX_API_KEY env var Soniox API key
base_url str No https://api.soniox.com/v1 API base URL (see regional endpoints)
options SonioxTranscriptionOptions No SonioxTranscriptionOptions() Transcription options
polling_interval_seconds float No 1.0 Time between status polls (seconds)
timeout_seconds float No 300.0 (5 minutes) Maximum time to wait for transcription
http_request_timeout_seconds float No 60.0 Timeout for individual HTTP requests

* You must specify exactly one of: file_path, file_data, or file_url.

Transcription options

The SonioxTranscriptionOptions class supports these parameters:

Parameter Type Description
model str Async model to use (see available models)
language_hints list[str] Language hints for transcription (ISO language codes)
language_hints_strict bool Enforce strict language hints
enable_speaker_diarization bool Enable speaker identification
enable_language_identification bool Enable language detection
translation TranslationConfig Translation configuration
context StructuredContext Context for improved accuracy
client_reference_id str Custom reference ID for your records
webhook_url str Webhook URL for completion notifications
webhook_auth_header_name str Custom auth header name for webhook
webhook_auth_header_value str Custom auth header value for webhook

Browse the API documentation for a full list of supported options.

Return value

The lazy_load() and alazy_load() methods yield a single Document object:

Document(
    page_content=str,  # The transcribed text
    metadata={
        "source": str,  # File URL, path, or "file_upload"
        "transcription_id": str,  # Unique transcription ID
        "audio_duration_ms": int,  # Audio duration in milliseconds
        "model": str,  # Model used for transcription
        "created_at": str,  # ISO 8601 timestamp
        "tokens": list[dict],  # Detailed token-level information
    }
)

The tokens array in metadata includes detailed information for each transcribed word:

  • text: The transcribed text
  • start_ms: Start time in milliseconds
  • end_ms: End time in milliseconds
  • speaker: Speaker ID (if diarization enabled), for example "1", "2", etc.
  • language: Detected language (if identification enabled), for example "en", "fr", etc.
  • translation_status: Translation status ("original", "translated" or "none")

Learn more about the Soniox API reference.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_soniox-0.1.0.tar.gz (90.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_soniox-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file langchain_soniox-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_soniox-0.1.0.tar.gz
  • Upload date:
  • Size: 90.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.2

File hashes

Hashes for langchain_soniox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 21ac4b80a53a80293ff83ec0ed527851d2f67b6b2e50d4985f46ec55b3520811
MD5 c2f6a1294c92dac922a8b84ac8b8a22a
BLAKE2b-256 607d1cab4481e9685c8e4f128e6c596ea234b817c530a8daefc0ee56dd05823d

See more details on using hashes here.

File details

Details for the file langchain_soniox-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_soniox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e088477e46b021bad0fbaa23afb2d0fb8f2b1683fb6cc3012c9b09469bf29a0
MD5 60a29466de986179e50685425f921b64
BLAKE2b-256 14e4f31e2ddfc74cd2f80b55f93146551ec29fa871af6dc8eb7c3a313969c693

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page