Fish Speech pipeline as library so you don't need to webui.

These details have not been verified by PyPI

Project links

Project description

Fish-Speech-Lib 0.1.0

Original project: Fish-Speech

Fish-Speech-Lib is a Python library that provides a simple interface to the Fish-Speech pipeline, allowing you to generate high-quality speech with voice cloning capabilities without requiring a web UI.

Pytorch with CUDA or MPS is required to get Fish-Speech-Lib working.

It may contain bugs. Report an issue in case of error.

Prerequisites

You must have Python>=3.10 installed.

You must have CUDA or MPS support for your GPU (MPS is not fully tested yet).

Installation

Install pytorch with CUDA or MPS support here: https://pytorch.org/get-started/locally/
Then, install Fish-Speech-Lib using pip install:

pip install git+https://github.com/Atm4x/Fish-speech-pipeline#egg=fish_speech_lib

Finally, create a .project-root file in the root directory of your project.

Usage

Fish-Speech-Lib provides a class called FishSpeech. There are a few parameters that are optional:

device - Device to run on: "cuda" (default), "cpu", or "mps"

half - Whether to use half-precision (FP16) (default is False)

compile_model - Whether to use torch.compile for optimization (Not tested, needs to have CUDA toolkit installed)(default is False)

llama_checkpoint_path - Path to LLaMA model (default is "checkpoints/fish-speech-1.5")

decoder_checkpoint_path - Path to decoder model (default is "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth")

streaming - Enable streaming mode (DOESN'T WORK RN) (default is False)

To use the model, first make an instance of FishSpeech:

from fish_speech_lib.inference import FishSpeech
import soundfile as sf

# Initialize model
tts = FishSpeech(
    device="cuda",
    half=False,
    compile_model=False
)

And the final step is calling the model to generate speech:

sample_rate, audio_data = tts(text="Hello, world!", max_new_tokens=450)
sf.write("output.wav", audio_data, sample_rate, format='WAV')

Parameters for the tts() function:

text - Text to be synthesized (required)

reference_audio - Path to reference audio for voice cloning (optional, default is None)

reference_audio_text - Text spoken in the reference audio (optional, default is "")

top_p - Top-p sampling parameter (optional, default is 0.7)

temperature - Temperature for sampling (optional, default is 0.7)

repetition_penalty - Repetition penalty (optional, default is 1.2)

max_new_tokens - Maximum number of tokens to generate (optional, default is 1024)

chunk_length - Length of iterative prompt in words (optional, default is 200)

seed - Random seed for reproducibility (optional, default is None)

use_memory_cache - Use memory cache for reference audio (optional, default is True)

Example of usage

A simple example for generating speech:

from fish_speech_lib.inference import FishSpeech
import soundfile as sf

# Initialize model
tts = FishSpeech(device="cuda")

# Generate speech
sample_rate, audio_data = tts(text="Hello, world!", max_new_tokens=450)

# Save the audio
sf.write("output.wav", audio_data, sample_rate, format='WAV')

Voice Cloning Example

# Generate speech with voice cloning
sample_rate, audio_data = tts(
    "This is an example of voice cloning with Fish-Speech.",
    reference_audio="path/to/reference.wav",
    reference_audio_text="The text that is spoken in the reference audio.",
    max_new_tokens=1000,
    chunk_length=1000
)

sf.write("cloned_voice.wav", audio_data, sample_rate, format='WAV')

Model Downloading

The library automatically downloads the required models from the Hugging Face Hub if they are not found locally. The models are downloaded to the specified checkpoint paths.

Exceptions

No exceptions found rn, but if you encounter any, please report them.

License

The code within this repository (Fish-Speech-Lib) is licensed under the Apache License 2.0. You can find a copy of the license in the LICENSE file.

IMPORTANT NOTE ON MODEL USAGE: The pre-trained models automatically downloaded and utilized by this library originate from the Fish-Speech project and are licensed separately under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

This means the models CANNOT be used for any commercial purposes.
Please review the full CC BY-NC-SA 4.0 license terms here: https://creativecommons.org/licenses/by-nc-sa/4.0/

Disclaimer

Users of this library are solely responsible for ensuring their usage complies with all applicable laws and ethical standards. Do not use this tool for illegal or harmful purposes. The developers of this fork are not liable for any misuse.

Copyright

Original Work (Fish-Speech): Copyright (c) 2024 Fish Audio Authors
Modifications in this Fork (Fish-Speech-Lib): Copyright (c) 2025 Atm4x

Authors

Atm4x

Based on Fish-Speech

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.0.1

Apr 24, 2025

This version

0.1.0

Apr 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fish_speech_lib-0.1.0.tar.gz (66.7 kB view details)

Uploaded Apr 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fish_speech_lib-0.1.0-py3-none-any.whl (79.1 kB view details)

Uploaded Apr 24, 2025 Python 3

File details

Details for the file fish_speech_lib-0.1.0.tar.gz.

File metadata

Download URL: fish_speech_lib-0.1.0.tar.gz
Upload date: Apr 24, 2025
Size: 66.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for fish_speech_lib-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cf78d6944eadcb5310b1960e9429c496b68d053ffc24a4aa7636bb0aa1b2fc9a`
MD5	`8ad0270e7c11f75303625b088e1a4320`
BLAKE2b-256	`7c0f5ef7e35fc1adfb6b360873da99ba45ea54ec7a0079d0293e34029c501100`

See more details on using hashes here.

File details

Details for the file fish_speech_lib-0.1.0-py3-none-any.whl.

File metadata

Download URL: fish_speech_lib-0.1.0-py3-none-any.whl
Upload date: Apr 24, 2025
Size: 79.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for fish_speech_lib-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e878e638c67342cc6212c47a21784b0b2e97a99582709b6a5d5522b94ed8f6df`
MD5	`80ba1c8e2f0dcae1b1a908270c39b2ca`
BLAKE2b-256	`0e46f831798630fd83b439da7af450c72fee1d040c239b1d6bc52fe49c463fca`

See more details on using hashes here.

fish-speech-lib 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fish-Speech-Lib 0.1.0

Prerequisites

Installation

Usage

Example of usage

Voice Cloning Example

Model Downloading

Exceptions

License

Disclaimer

Copyright

Authors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes