Skip to main content

High-performance speech synthesis

Project description

fish_speech_python

Python bindings for the Fish Speech Candle implementation, using PyO3.

Supported Platforms

[!WARNING] Read this list very carefully. Hardware support is very limited. If you try to use this library on unsupported hardware, it will probably not work.

You have been warned.

Python: 3.9+.

OS + hardware:

  • Linux:
    • CPU: x86_64, glibc 2.34+
      • Example: Ubuntu 22.04 IS supported, Ubuntu 20.04 IS NOT supported
    • GPU: Nvidia CUDA 11.8+ with compute capability >= 8.0 (RTX 30 series+, A100 series+)
      • Example: 2080 Ti is NOT supported (Turing)
      • Example: RX5700 XT is NOT supported (AMD)
      • NOTE: We don't currently have a CUDA build matrix, so it's compiled with CUDA 11.8; sorry. It should be compatible with newer CUDA versions, but please use the Rust runtime if full optimization is required.
  • macOS (M1+, 14.0+ (Monterey))

Windows and AMD hardware will never be supported, so don't ask. Feel free to raise an issue if you need ARM or Alpine Linux.

Installation

# From PyPI
pip install fish_speech_rs

and done.

Usage

Codec

This is the low-level API. You feed it PCM audio, it compresses it into codes, and then decompresses it back into PCM.

from fish_speech import FireflyCodec
import numpy as np
# optional but highly recommended
from huggingface_hub import snapshot_download

# This just returns a directory path.
# Substitute with your own directory path if you don't want to download from Hugging Face.
dir = snapshot_download("jkeisling/fish-speech-1.5")

# Load the codec model (set device to "cuda" for speed)
codec = FireflyCodec(
    dir,
    version="1.5",  # Supports 1.2 to 1.5; 1.5 is default
    device="cuda"    # Or "cpu" (much slower), "metal" on Apple Silicon
)

# 1s of random audio. Substitute with your own audio.
# You will need to resample to codec.sample_rate yourself. Soundfile is recommended.
pcm = np.random.randn(1, 1, codec.sample_rate).astype(np.float32)  # (batch, channels, samples)
# Encode raw PCM into compressed codes
codes = codec.encode(pcm)

# Decode the compressed codes back into PCM
decoded_pcm = codec.decode(codes)
  • Input: Raw PCM audio (please handle resampling to 44.1 kHz yourself)
  • Output: Encoded Numpy uint32 “codes” (compressed speech)

LM

The language model (LM) takes text and turns it into speech codes, which you then decode back to audio.

from fish_speech import LM
from typing import List

# Load the TTS model
lm = LM(
    dir,
    version="1.5",
    device="cuda",
    # bf16 only recommended for CUDA, otherwise leave it default (f32)
    dtype="bf16"
)

# Extract the speaker prompt from reference audio
speaker_prompt = lm.get_speaker_prompt([{
    'text': 'foobar',
    'codes': codes  # From previous encoding step
}], sysprompt="Speak out the provided text.")

# Generate speech codes
# Text chunking and normalization are your responsibility (sorry!);
# official text preprocessing helper function coming soon
generated_codes = lm.generate(["This is a test", "This is another test"], speaker_prompt=speaker_prompt)

# Decode to PCM audio using codec from earlier
pcm = codec.decode(generated_codes)

If you're in a Jupyter notebook, you can use the following code to play the audio in a widget:

# assumes you ran the above code
from IPython.display import Audio

Audio(pcm.flatten(), rate=codec.sample_rate)

Developing

Requires Python and Rust toolchains. Clone this repo, set up a Rust and Python toolchain.

  1. python -m venv .venv
  2. pip install -r requirements.txt
  3. maturin develop

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fish_speech_rs-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fish_speech_rs-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fish_speech_rs-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fish_speech_rs-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fish_speech_rs-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.3.0-cp39-cp39-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file fish_speech_rs-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fa387dbed8a12b4bda6ef99d9a0b5a573b7b07aeee2f4b0b939a783952d4326a
MD5 701dfa33c5b304bd3fad9d97add8447d
BLAKE2b-256 3289a1a4da204193e19df5833abddfe06a98b946aa67351f251668ce15800397

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9515f218e984ef03b005bce6df17ad2755ec3199b897a0122fd8bc88f3be8a2
MD5 66c70d4080664141199bc1dc7f0021da
BLAKE2b-256 8edafe7d59e47d78f085fe249f07f19d7e1a858d48621d7347a1247197a2f8c7

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5b6eb7e2c2f92516df80142b13e561bda68c557bf63f26bd9eebc6921db02f2d
MD5 ebea20694586b53b264bdcf72acd8ac1
BLAKE2b-256 aeaca182c249be70190301944720d911e6fa1ac369ae4a3c2f409be38d818905

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f6824d2d94c003671fd7d17917c7e9e20b7d8d1678cd056eaa88b59441aaaf16
MD5 06d222b2cc4739a3d689931ecaaac9cb
BLAKE2b-256 1be14b343617723d2f143314e9af484df679d5e19af54ded2dfeca7605b3b299

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0314e9fc1a0c87a510a8b3f1063dc4badac3ff6ed9c9b8f0991b71ade724aca1
MD5 401c7dca7f1f51d05a3067b1680965aa
BLAKE2b-256 bc8e8ee54c5bf677e38be21bbcac31b910bdd8a51c32da341bb6bd4d56cb1234

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 41b7b467bf618db39319ec48000fc25e39575e2783f38f289394dc375d81a3c0
MD5 2a466704de31bfd7dafb3a5c76a53852
BLAKE2b-256 a2bb428a4506a1b37d2830f0d9798ad5cd21c2b5a4f9dfbb927c8121a415f214

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0e2071f89d3d799f36ba9ec55067d39703cef503cc61d46608dcac280c102ed0
MD5 aaf3b096b77d3a15ee7701c8db34e00b
BLAKE2b-256 a181621cfaee05aecd0b286e81e31d6abee0012382f260f9523a63940a82b3c9

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e511d2fe8f14cffbfdb13cab0fb90ce61eae7a0e75fd5c06d3aeb90c827b9341
MD5 609e95af9994fb2e4eab1b49e5ac91cf
BLAKE2b-256 8fc2ac0ddc33a059e500b7e4c88ce64df20bd93ef0a177ce8d037db369504e12

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 812c155cfd59eb3be09c6c7bf17ddab73b07e9e02725b046dfdee07797052ee9
MD5 50b70538a0233a2c9785c1b050118b8e
BLAKE2b-256 75c528c748bff15f7c50d56aad49b9ded9bd01c2095f6bfd6146075f015b6b5f

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.3.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.3.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 88736af77f252d16a6eba3232a7f359a4071e46e037b1588e8af4c566f3ff823
MD5 30127f0590411fb63c22eb4bcf8ebe27
BLAKE2b-256 27b27bd9717cc1005ff2de3ad5536d4a3c28adcb5fef7e33fa6d00f2f927bb39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page