LangChain integration for Soniox
Project description
Soniox LangChain Integration
Get started using the Soniox audio transcription loader in LangChain.
Setup
Install the package:
pip install langchain-soniox
Credentials
Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_key
Usage
Basic transcription
Transcribe audio files using the SonioxDocumentLoader:
from langchain_soniox import SonioxDocumentLoader
# Using a URL
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = list(loader.lazy_load())
print(docs[0].page_content) # Transcribed text
You can also load audio from a local file or from bytes:
# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")
# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)
Async transcription
For async operations, use alazy_load():
import asyncio
from langchain_soniox import SonioxDocumentLoader
async def transcribe_async():
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = [doc async for doc in loader.alazy_load()]
print(docs[0].page_content)
asyncio.run(transcribe_async())
Advanced usage
Language hints
Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.
Language hints do not restrict recognition — they only bias the model toward the specified languages, while still allowing other languages to be detected if present.
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
language_hints=["en", "es"],
),
)
docs = list(loader.lazy_load())
For more details, see the Soniox language hints documentation.
Speaker diarization
Enable speaker identification to distinguish between different speakers:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_speaker_diarization=True,
),
)
docs = list(loader.lazy_load())
# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_speaker != token["speaker"]:
current_speaker = token["speaker"]
output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
else:
output += token["text"]
print(output)
Language identification
Enable automatic language detection and identification:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_language_identification=True,
),
)
docs = list(loader.lazy_load())
# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_language != token["language"]:
current_language = token["language"]
output += f"\n[{current_language}] {token['text'].lstrip()}"
else:
output += token["text"]
print(output)
Context for improved accuracy
Provide domain-specific context to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.
The context object supports four optional sections:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
StructuredContext,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
context=StructuredContext(
# Structured key-value information (domain, topic, intent, etc.)
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
],
# Longer free-form background text or related documents
text="The patient has a history of...",
# Domain-specific or uncommon words
terms=["Celebrex", "Zyrtec", "Xanax"],
# Custom translations for ambiguous terms
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(source="MRI", target="RM"),
],
),
),
)
docs = list(loader.lazy_load())
For more details, see the Soniox context documentation.
Translation
Translate from any detected language to a target language:
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
TranslationConfig,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
translation=TranslationConfig(
type="one_way",
target_language="fr",
),
language_hints=["en"],
),
)
docs = list(loader.lazy_load())
translated_text = ""
original_text = ""
for token in docs[0].metadata["tokens"]:
if token["translation_status"] == "translation":
translated_text += token["text"]
else:
original_text += token["text"]
print("Original text:", original_text)
print("Translated text:", translated_text)
You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.
API reference
Constructor parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_path |
str |
No* | None |
Path to local audio file to transcribe |
file_data |
bytes |
No* | None |
Binary data of audio file to transcribe |
file_url |
str |
No* | None |
URL of audio file to transcribe |
api_key |
str |
No | SONIOX_API_KEY env var |
Soniox API key |
base_url |
str |
No | https://api.soniox.com/v1 |
API base URL (see regional endpoints) |
options |
SonioxTranscriptionOptions |
No | SonioxTranscriptionOptions() |
Transcription options |
polling_interval_seconds |
float |
No | 1.0 |
Time between status polls (seconds) |
timeout_seconds |
float |
No | 300.0 (5 minutes) |
Maximum time to wait for transcription |
http_request_timeout_seconds |
float |
No | 60.0 |
Timeout for individual HTTP requests |
* You must specify exactly one of: file_path, file_data, or file_url.
Transcription options
The SonioxTranscriptionOptions class supports these parameters:
| Parameter | Type | Description |
|---|---|---|
model |
str |
Async model to use (see available models) |
language_hints |
list[str] |
Language hints for transcription (ISO language codes) |
language_hints_strict |
bool |
Enforce strict language hints |
enable_speaker_diarization |
bool |
Enable speaker identification |
enable_language_identification |
bool |
Enable language detection |
translation |
TranslationConfig |
Translation configuration |
context |
StructuredContext |
Context for improved accuracy |
client_reference_id |
str |
Custom reference ID for your records |
webhook_url |
str |
Webhook URL for completion notifications |
webhook_auth_header_name |
str |
Custom auth header name for webhook |
webhook_auth_header_value |
str |
Custom auth header value for webhook |
Browse the API documentation for a full list of supported options.
Return value
The lazy_load() and alazy_load() methods yield a single Document object:
Document(
page_content=str, # The transcribed text
metadata={
"source": str, # File URL, path, or "file_upload"
"transcription_id": str, # Unique transcription ID
"audio_duration_ms": int, # Audio duration in milliseconds
"model": str, # Model used for transcription
"created_at": str, # ISO 8601 timestamp
"tokens": list[dict], # Detailed token-level information
}
)
The tokens array in metadata includes detailed information for each transcribed word:
text: The transcribed textstart_ms: Start time in millisecondsend_ms: End time in millisecondsspeaker: Speaker ID (if diarization enabled), for example"1","2", etc.language: Detected language (if identification enabled), for example"en","fr", etc.translation_status: Translation status ("original","translated"or"none")
Learn more about the Soniox API reference.
Related
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_soniox-0.1.1.tar.gz.
File metadata
- Download URL: langchain_soniox-0.1.1.tar.gz
- Upload date:
- Size: 90.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac7effc149565e7b5db3cdaaed8ed0abc9bb533017acd9e860dd4d2b144dfa19
|
|
| MD5 |
a51d82be3397551b79744dd94986710b
|
|
| BLAKE2b-256 |
e69225e7d19e84e2bacb2d0dece7291e10bc1147923db71253502ff650dae10c
|
File details
Details for the file langchain_soniox-0.1.1-py3-none-any.whl.
File metadata
- Download URL: langchain_soniox-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf452985efe81fc1c3188ea9d3fca2fc0342257eb305f1d3b830c8e6e7978faf
|
|
| MD5 |
1977d6c50a26fac8d6b802732d935936
|
|
| BLAKE2b-256 |
70740175003c4c7ae8712805e7d5e163cbf926040b04786c610c1bab6f00e4d4
|