Skip to main content

High-performance speech synthesis

Project description

fish_speech_python

Python bindings for the Fish Speech Candle implementation, using PyO3.

Supported Platforms

[!WARNING] Read this list very carefully. Hardware support is very limited. If you try to use this library on unsupported hardware, it will probably not work.

You have been warned.

Python: 3.9+.

OS + hardware:

  • Linux:
    • CPU: x86_64, glibc 2.34+
      • Example: Ubuntu 22.04 IS supported, Ubuntu 20.04 IS NOT supported
    • GPU: Nvidia CUDA 12+ with compute capability >= 8.0 (RTX 30 series+, A100 series+)
      • Example: 2080 Ti is NOT supported (Turing)
      • Example: RX5700 XT is NOT supported (AMD)
  • macOS (M1+, 14.0+ (Monterey))

Windows and AMD hardware will never be supported, so don't ask. Feel free to raise an issue if you need ARM or Alpine Linux.

Installation

# From PyPI
pip install fish_speech_rs

and done.

Usage

Codec

This is the low-level API. You feed it PCM audio, it compresses it into codes, and then decompresses it back into PCM.

from fish_speech import FireflyCodec
import numpy as np
# optional but highly recommended
from huggingface_hub import snapshot_download

# This just returns a directory path.
# Substitute with your own directory path if you don't want to download from Hugging Face.
dir = snapshot_download("jkeisling/fish-speech-1.5")

# Load the codec model (set device to "cuda" for speed)
codec = FireflyCodec(
    dir,
    version="1.5",  # Supports 1.2 to 1.5; 1.5 is default
    device="cuda"    # Or "cpu" (much slower), "metal" on Apple Silicon
)

# 1s of random audio. Substitute with your own audio.
# You will need to resample to codec.sample_rate yourself. Soundfile is recommended.
pcm = np.random.randn(1, 1, codec.sample_rate).astype(np.float32)  # (batch, channels, samples)
# Encode raw PCM into compressed codes
codes = codec.encode(pcm)

# Decode the compressed codes back into PCM
decoded_pcm = codec.decode(codes)
  • Input: Raw PCM audio (please handle resampling to 44.1 kHz yourself)
  • Output: Encoded Numpy uint32 “codes” (compressed speech)

LM

The language model (LM) takes text and turns it into speech codes, which you then decode back to audio.

from fish_speech import LM
from typing import List

# Load the TTS model
lm = LM(
    dir,
    version="1.5",
    device="cuda",
    # bf16 only recommended for CUDA, otherwise leave it default (f32)
    dtype="bf16"
)

# Extract the speaker prompt from reference audio
speaker_prompt = lm.get_speaker_prompt([{
    'text': 'foobar',
    'codes': codes  # From previous encoding step
}], sysprompt="Speak out the provided text.")

# Generate speech codes
# Text chunking and normalization are your responsibility (sorry!);
# official text preprocessing helper function coming soon
generated_codes = lm.generate(["This is a test", "This is another test"], speaker_prompt=speaker_prompt)

# Decode to PCM audio using codec from earlier
pcm = codec.decode(generated_codes)

If you're in a Jupyter notebook, you can use the following code to play the audio in a widget:

# assumes you ran the above code
from IPython.display import Audio

Audio(pcm.flatten(), rate=codec.sample_rate)

Developing

Requires Python and Rust toolchains. Clone this repo, set up a Rust and Python toolchain.

  1. python -m venv .venv
  2. pip install -r requirements.txt
  3. maturin develop

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fish_speech_rs-0.2.3-cp313-cp313-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.2.3-cp313-cp313-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fish_speech_rs-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fish_speech_rs-0.2.3-cp311-cp311-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.2.3-cp311-cp311-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

fish_speech_rs-0.2.3-cp310-cp310-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.2.3-cp310-cp310-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fish_speech_rs-0.2.3-cp39-cp39-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

fish_speech_rs-0.2.3-cp39-cp39-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file fish_speech_rs-0.2.3-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b0c0637d0bcb434d62d36af085283cf4c4f63797c0fa29c396a287cd8d9ddf2d
MD5 3eef4294bebb23ed36c4a6660b89112d
BLAKE2b-256 79fb51dd93893060b8574b4907c63cc38f7de1bfee1741a4450d20c7bc7bc019

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2e5f7d1992701f2d1a5d43f0a20be4f805b665585594a960104d0d3c51b0007a
MD5 0a2888302ec5838507711a92e8f012fd
BLAKE2b-256 bdacbd319c18e200e986cb65e08fc88531587f119c2e8ca6d182e6fe49add6e0

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 425cf98a15d85a89ed901248311ec51e31b056d7a9b05cd2c3fa923591f08b60
MD5 3efba457395d0fde0d9d206763f4663b
BLAKE2b-256 d84625a465d9648cd5c599cdcb36c90f81bb95081450578ae0a2c88d913e0816

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 32c2c56269ac65a5011aaedc5e57b07960fb54798db823757140f2c8f5266f7f
MD5 4f906920b7e99737dfe6ae4435357771
BLAKE2b-256 92f809a47440ca64cc4e3973e39197ab856bb8cfe895f6d2eaf4a59ae80ab60f

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2b6a70724cc75540468619740d410b959c9bef42cd643a9d22cc7e2ccfec0a6a
MD5 feb1bd6c9e37bf7f046a9c8c8d0dac10
BLAKE2b-256 4efdc10576b38e590898a6dd373460950e382d9f25d470fa275617698667b3c5

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 95ff0ddc657635993800c8a42e6467cec67d09053db752b68f8a6f057614ab36
MD5 58c091d47810eca98420ec47fe203358
BLAKE2b-256 ccb1c8c11187ae8c2d6eb5dbaa70dc0c3f69044f8a592c497554aeade9db92c6

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4207cc2077f71dc89ee915e0f5885d090b416dcb163603cd399f1c6c86511081
MD5 5c1a59aa39ba692f8e3bae05a09e7c78
BLAKE2b-256 006f6b28f8a883b5d875e8eb787585900f90b1d779373ea6767c6567910a86e4

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 de39c12240e1009c6a90af7fc2e064ab2d42f9d97374b5204e309e151c8a1a5d
MD5 6adf339dd439f6980046e4f73135c35e
BLAKE2b-256 ce0697e2ee3b626b916cd5648378bb765d0ab313c0903b39c005c2e52be1c7cf

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bf873d2981bcc0893f23e1f3197acca9ca93ff94b141c0a4a2c299738471fd52
MD5 47607564d03b3b707042ea9959b4aec4
BLAKE2b-256 77b50daf6c6368eaa359220237f195a24908264c65aa5549de064025386c6aaf

See more details on using hashes here.

File details

Details for the file fish_speech_rs-0.2.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fish_speech_rs-0.2.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 10434702a40cb4005d9a7cf07447a05e16b93dd521dd7cc68ef76622ddf8cbaa
MD5 f1b426c30b57874e07058fa220a24b00
BLAKE2b-256 809cec9e37398b27a9a41f8f9045de0875bff505e916704a15384763a736844e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page