High-performance speech synthesis
Project description
fish_speech_python
Python bindings for the Fish Speech Candle implementation, using PyO3.
Supported Platforms
[!WARNING] Read this list very carefully. Hardware support is very limited. If you try to use this library on unsupported hardware, it will probably not work.
You have been warned.
Python: 3.9+.
OS + hardware:
- Linux:
- CPU: x86_64, glibc 2.34+
- Example: Ubuntu 22.04 IS supported, Ubuntu 20.04 IS NOT supported
- GPU: Nvidia CUDA 12+ with compute capability >= 8.0 (RTX 30 series+, A100 series+)
- Example: 2080 Ti is NOT supported (Turing)
- Example: RX5700 XT is NOT supported (AMD)
- CPU: x86_64, glibc 2.34+
- macOS (M1+, 14.0+ (Monterey))
Windows and AMD hardware will never be supported, so don't ask. Feel free to raise an issue if you need ARM or Alpine Linux.
Installation
# From PyPI
pip install fish_speech_rs
and done.
Usage
Codec
This is the low-level API. You feed it PCM audio, it compresses it into codes, and then decompresses it back into PCM.
from fish_speech import FireflyCodec
import numpy as np
# optional but highly recommended
from huggingface_hub import snapshot_download
# This just returns a directory path.
# Substitute with your own directory path if you don't want to download from Hugging Face.
dir = snapshot_download("jkeisling/fish-speech-1.5")
# Load the codec model (set device to "cuda" for speed)
codec = FireflyCodec(
dir,
version="1.5", # Supports 1.2 to 1.5; 1.5 is default
device="cuda" # Or "cpu" (much slower), "metal" on Apple Silicon
)
# 1s of random audio. Substitute with your own audio.
# You will need to resample to codec.sample_rate yourself. Soundfile is recommended.
pcm = np.random.randn(1, 1, codec.sample_rate).astype(np.float32) # (batch, channels, samples)
# Encode raw PCM into compressed codes
codes = codec.encode(pcm)
# Decode the compressed codes back into PCM
decoded_pcm = codec.decode(codes)
- Input: Raw PCM audio (please handle resampling to 44.1 kHz yourself)
- Output: Encoded Numpy uint32 “codes” (compressed speech)
LM
The language model (LM) takes text and turns it into speech codes, which you then decode back to audio.
from fish_speech import LM
from typing import List
# Load the TTS model
lm = LM(
dir,
version="1.5",
device="cuda",
# bf16 only recommended for CUDA, otherwise leave it default (f32)
dtype="bf16"
)
# Extract the speaker prompt from reference audio
speaker_prompt = lm.get_speaker_prompt([{
'text': 'foobar',
'codes': codes # From previous encoding step
}], sysprompt="Speak out the provided text.")
# Generate speech codes
# Text chunking and normalization are your responsibility (sorry!);
# official text preprocessing helper function coming soon
generated_codes = lm.generate(["This is a test", "This is another test"], speaker_prompt=speaker_prompt)
# Decode to PCM audio using codec from earlier
pcm = codec.decode(generated_codes)
If you're in a Jupyter notebook, you can use the following code to play the audio in a widget:
# assumes you ran the above code
from IPython.display import Audio
Audio(pcm.flatten(), rate=codec.sample_rate)
Developing
Requires Python and Rust toolchains. Clone this repo, set up a Rust and Python toolchain.
python -m venv .venvpip install -r requirements.txtmaturin develop
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fish_speech_rs-0.2.3-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0c0637d0bcb434d62d36af085283cf4c4f63797c0fa29c396a287cd8d9ddf2d
|
|
| MD5 |
3eef4294bebb23ed36c4a6660b89112d
|
|
| BLAKE2b-256 |
79fb51dd93893060b8574b4907c63cc38f7de1bfee1741a4450d20c7bc7bc019
|
File details
Details for the file fish_speech_rs-0.2.3-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e5f7d1992701f2d1a5d43f0a20be4f805b665585594a960104d0d3c51b0007a
|
|
| MD5 |
0a2888302ec5838507711a92e8f012fd
|
|
| BLAKE2b-256 |
bdacbd319c18e200e986cb65e08fc88531587f119c2e8ca6d182e6fe49add6e0
|
File details
Details for the file fish_speech_rs-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
425cf98a15d85a89ed901248311ec51e31b056d7a9b05cd2c3fa923591f08b60
|
|
| MD5 |
3efba457395d0fde0d9d206763f4663b
|
|
| BLAKE2b-256 |
d84625a465d9648cd5c599cdcb36c90f81bb95081450578ae0a2c88d913e0816
|
File details
Details for the file fish_speech_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32c2c56269ac65a5011aaedc5e57b07960fb54798db823757140f2c8f5266f7f
|
|
| MD5 |
4f906920b7e99737dfe6ae4435357771
|
|
| BLAKE2b-256 |
92f809a47440ca64cc4e3973e39197ab856bb8cfe895f6d2eaf4a59ae80ab60f
|
File details
Details for the file fish_speech_rs-0.2.3-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b6a70724cc75540468619740d410b959c9bef42cd643a9d22cc7e2ccfec0a6a
|
|
| MD5 |
feb1bd6c9e37bf7f046a9c8c8d0dac10
|
|
| BLAKE2b-256 |
4efdc10576b38e590898a6dd373460950e382d9f25d470fa275617698667b3c5
|
File details
Details for the file fish_speech_rs-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95ff0ddc657635993800c8a42e6467cec67d09053db752b68f8a6f057614ab36
|
|
| MD5 |
58c091d47810eca98420ec47fe203358
|
|
| BLAKE2b-256 |
ccb1c8c11187ae8c2d6eb5dbaa70dc0c3f69044f8a592c497554aeade9db92c6
|
File details
Details for the file fish_speech_rs-0.2.3-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4207cc2077f71dc89ee915e0f5885d090b416dcb163603cd399f1c6c86511081
|
|
| MD5 |
5c1a59aa39ba692f8e3bae05a09e7c78
|
|
| BLAKE2b-256 |
006f6b28f8a883b5d875e8eb787585900f90b1d779373ea6767c6567910a86e4
|
File details
Details for the file fish_speech_rs-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de39c12240e1009c6a90af7fc2e064ab2d42f9d97374b5204e309e151c8a1a5d
|
|
| MD5 |
6adf339dd439f6980046e4f73135c35e
|
|
| BLAKE2b-256 |
ce0697e2ee3b626b916cd5648378bb765d0ab313c0903b39c005c2e52be1c7cf
|
File details
Details for the file fish_speech_rs-0.2.3-cp39-cp39-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp39-cp39-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.9, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf873d2981bcc0893f23e1f3197acca9ca93ff94b141c0a4a2c299738471fd52
|
|
| MD5 |
47607564d03b3b707042ea9959b4aec4
|
|
| BLAKE2b-256 |
77b50daf6c6368eaa359220237f195a24908264c65aa5549de064025386c6aaf
|
File details
Details for the file fish_speech_rs-0.2.3-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: fish_speech_rs-0.2.3-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.2 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10434702a40cb4005d9a7cf07447a05e16b93dd521dd7cc68ef76622ddf8cbaa
|
|
| MD5 |
f1b426c30b57874e07058fa220a24b00
|
|
| BLAKE2b-256 |
809cec9e37398b27a9a41f8f9045de0875bff505e916704a15384763a736844e
|