Fish Audio TTS and STT integration for Vision Agents
Project description
Fish Audio Plugin
A high-quality Text-to-Speech (TTS) and Speech-to-Text (STT) plugin for Vision Agents that uses the Fish Audio API.
Installation
pip install vision-agents-plugins-fish
Usage
Text-to-Speech (TTS)
from vision_agents.plugins.fish import TTS
from getstream.video.rtc.audio_track import AudioStreamTrack
# Initialize with API key from environment variable
tts = TTS()
# Or specify API key directly
tts = TTS(api_key="your_fish_audio_api_key")
# Create an audio track to output speech
track = AudioStreamTrack(framerate=16000)
tts.set_output_track(track)
# Register event handlers
@tts.events.subscribe
async def on_audio(event):
print(f"Received audio chunk: {len(event.audio_data)} bytes")
# Send text to be converted to speech
await tts.send("Hello, this is a test of the Fish Audio text-to-speech plugin.")
Speech-to-Text (STT)
from vision_agents.plugins.fish import STT
from getstream.video.rtc.track_util import PcmData
# Initialize with API key from environment variable
stt = STT()
# Or specify API key directly and language
stt = STT(api_key="your_fish_audio_api_key", language="en")
# Register event handlers
@stt.events.subscribe
async def on_transcript(event):
print(f"Transcript: {event.text}")
# Process audio data
pcm_data = PcmData(samples=audio_samples, sample_rate=16000)
await stt.process_audio(pcm_data)
Configuration Options
TTS Options
api_key: Fish Audio API key (default: reads fromFISH_API_KEYenvironment variable)reference_id: Optional reference voice ID to use for synthesisbase_url: Optional custom API endpoint (default: uses Fish Audio's default endpoint)client: Optionally pass in your own instance of the Fish Audio Session
STT Options
api_key: Fish Audio API key (default: reads fromFISH_API_KEYenvironment variable)language: Language code for transcription (e.g., "en", "zh"). If None, automatic language detection will be usedignore_timestamps: Skip timestamp processing for faster results (default: False)sample_rate: Sample rate of the audio in Hz (default: 16000)base_url: Optional custom API endpointclient: Optionally pass in your own instance of the Fish Audio Session
Reference Audio
Fish Audio supports using reference audio for voice cloning:
from vision_agents.plugins.fish import TTS
# Using a reference voice ID
tts = TTS(reference_id="your_reference_voice_id")
# Or pass reference audio dynamically when sending text
# (See Fish Audio SDK documentation for advanced usage)
Supported Languages (STT)
Fish Audio STT supports multiple languages with automatic detection. Common language codes include:
en- Englishzh- Chinesees- Spanishfr- Frenchde- Germanja- Japaneseko- Koreanpt- Portuguese
For automatic language detection, set language=None (default).
Supported Audio Formats (STT)
The STT implementation accepts PCM audio data and converts it to WAV format internally. Supported configurations:
- Maximum audio size: 100MB
- Maximum duration: 60 minutes
- Sample rate: 16kHz or higher recommended
- Format: Mono, 16-bit PCM
Requirements
- Python 3.10+
- fish-audio-sdk>=2025.4.2
Getting Your API Key
- Sign up for a Fish Audio account at https://fish.audio
- Navigate to the API Keys section in your dashboard
- Create a new API key
- Set the
FISH_API_KEYenvironment variable or pass it directly to the plugin
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_fish-0.4.0.tar.gz.
File metadata
- Download URL: vision_agents_plugins_fish-0.4.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0562c3601641022b044bf86de148e1ed54ffa6201bff41017fffb423f0dbd479
|
|
| MD5 |
2f9a7213e70766bb7ce039a48bbe24ce
|
|
| BLAKE2b-256 |
c5d250490a88cf9c6a9ce785dc15d1923be346f6be590eb010bf29346d8e5fe8
|
File details
Details for the file vision_agents_plugins_fish-0.4.0-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_fish-0.4.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9a29195a3ff451691e3b13afe99e6d6933e20bcabb2613eb6b1ffca3c2191fa
|
|
| MD5 |
63ce4e66dc36fe7a23a1588467ab1f1d
|
|
| BLAKE2b-256 |
d5c7f197ac43bdc91733c0046668532323554ecbb32906968fbd5ea95f5de1a4
|