Skip to main content

Fish Audio TTS and STT integration for Vision Agents

Project description

Fish Audio Plugin

A high-quality Text-to-Speech (TTS) and Speech-to-Text (STT) plugin for Vision Agents that uses the Fish Audio API.

Installation

uv add "vision-agents[fish]"
# or directly
uv add vision-agents-plugins-fish

Usage

Text-to-Speech (TTS)

from vision_agents.plugins.fish import TTS
from getstream.video.rtc.audio_track import AudioStreamTrack

# Initialize with API key from environment variable
tts = TTS()

# Or specify API key directly
tts = TTS(api_key="your_fish_audio_api_key")

# Create an audio track to output speech
track = AudioStreamTrack(framerate=16000)
tts.set_output_track(track)

# Register event handlers
@tts.events.subscribe
async def on_audio(event):
    print(f"Received audio chunk: {len(event.audio_data)} bytes")

# Send text to be converted to speech
async for chunk in tts.send_iter("Hello, this is a test of the Fish Audio text-to-speech plugin."):
    pass

Speech-to-Text (STT)

from vision_agents.plugins.fish import STT
from getstream.video.rtc.track_util import PcmData

# Initialize with API key from environment variable
stt = STT()

# Or specify API key directly and language
stt = STT(api_key="your_fish_audio_api_key", language="en")

# Register event handlers
@stt.events.subscribe
async def on_transcript(event):
    print(f"Transcript: {event.text}")

# Process audio data
pcm_data = PcmData(samples=audio_samples, sample_rate=16000)
await stt.process_audio(pcm_data)

Configuration Options

TTS Options

  • api_key: Fish Audio API key (default: reads from FISH_API_KEY environment variable)
  • reference_id: Optional reference voice ID to use for synthesis
  • base_url: Optional custom API endpoint (default: uses Fish Audio's default endpoint)
  • client: Optionally pass in your own instance of the Fish Audio Session

STT Options

  • api_key: Fish Audio API key (default: reads from FISH_API_KEY environment variable)
  • language: Language code for transcription (e.g., "en", "zh"). If None, automatic language detection will be used
  • ignore_timestamps: Skip timestamp processing for faster results (default: False)
  • sample_rate: Sample rate of the audio in Hz (default: 16000)
  • base_url: Optional custom API endpoint
  • client: Optionally pass in your own instance of the Fish Audio Session

Reference Audio

Fish Audio supports using reference audio for voice cloning:

from vision_agents.plugins.fish import TTS

# Using a reference voice ID
tts = TTS(reference_id="your_reference_voice_id")

# Or pass reference audio dynamically when sending text
# (See Fish Audio SDK documentation for advanced usage)

Supported Languages (STT)

Fish Audio STT supports multiple languages with automatic detection. Common language codes include:

  • en - English
  • zh - Chinese
  • es - Spanish
  • fr - French
  • de - German
  • ja - Japanese
  • ko - Korean
  • pt - Portuguese

For automatic language detection, set language=None (default).

Supported Audio Formats (STT)

The STT implementation accepts PCM audio data and converts it to WAV format internally. Supported configurations:

  • Maximum audio size: 100MB
  • Maximum duration: 60 minutes
  • Sample rate: 16kHz or higher recommended
  • Format: Mono, 16-bit PCM

Requirements

  • Python 3.10+
  • fish-audio-sdk>=2025.4.2

Getting Your API Key

  1. Sign up for a Fish Audio account at https://fish.audio
  2. Navigate to the API Keys section in your dashboard
  3. Create a new API key
  4. Set the FISH_API_KEY environment variable or pass it directly to the plugin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_fish-0.6.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vision_agents_plugins_fish-0.6.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file vision_agents_plugins_fish-0.6.1.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_fish-0.6.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_fish-0.6.1.tar.gz
Algorithm Hash digest
SHA256 454ffc10816aac8bf2656b2b6810b3f5ce60e3ac21280e40a5287418c095e7fc
MD5 c93e35a5b9e935af26acb0f45314b081
BLAKE2b-256 ec815f3ca26af76abe7a6598829b9fc3c7a66020fbd7c1e261241baa83d188af

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_fish-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_fish-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_fish-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a817a5965886cfb6de632a913ae2a8f18bfd11c2b18323af54fe70b4481fa59
MD5 f6d572a3cbdcdf0a7803ce0d70792ac9
BLAKE2b-256 05f55eaad9840985df435b2d385ab458da008aa7c312c18192a5e96084a643f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page