Skip to main content

No project description provided

Project description

speech_neuron

A text-to-speech server to convert text to speech using the Kokoro-TTS models and FastAPI.

Other Neuron Packages

Quick Start

Run:

pip install speech_neuron

Create a config.yaml file with the following content, see Configuration for more details.

Create a main.py file with the following content:

import os
import yaml
from fastapi import FastAPI
import uvicorn
from speech_neuron import SpeechNeuronServer, NodeConfig

CONFIG_PATH = os.environ.get("NODE_CONFIG_PATH", "config.yaml")
config = NodeConfig(**yaml.safe_load(open(CONFIG_PATH, "r")))

app = FastAPI()

speech_neuron = SpeechNeuronServer(config)
app.include_router(speech_neuron.router)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Create a client file client.py with the following content:

import requests
import io
import sounddevice as sd
import soundfile as sf
from datetime import datetime

HOST = "http://0.0.0.0:8000" # <--- Change to your server IP
url = f"{HOST}/node/speech"

start = datetime.now()
response = requests.get(
    url,
    params={
        "text": """Anyway, it was the Saturday of the football game with Saxon Hall. 
                   The game with Saxon Hall was supposed to be a very big deal around Pencey. 
        """,
        "voice": "af_bella",
        "speed": 1.1,
        "split_pattern": r"\n+",
    },
    stream=True,
)


# Read the streamed response into memory
audio_buffer = io.BytesIO()
for chunk in response.iter_content(chunk_size=4096):
    if chunk:
        audio_buffer.write(chunk)

# Play the audio in real-time
audio_buffer.seek(0)  # Reset buffer for reading
data, samplerate = sf.read(audio_buffer)
sd.play(data, samplerate)
sd.wait()  # Wait for audio to finish playing

print(f"Time taken: {datetime.now() - start}")

Run:

python main.py &

And then run:

python client.py

Configuration

Create a config.yaml file with the following content:

name: "speech_node"

# "kokoro-v1.0.fp16-gpu.onnx",
# "kokoro-v1.0.fp16.onnx",
# "kokoro-v1.0.int8.onnx",
# "kokoro-v1.0.onnx"
model_name: kokoro-v1.0.int8.onnx
voices_name: voices-v1.0.bin

response:
  # TODO: type: stream
  sample_rate: 24000
  format: wav
  compression_level: 0

pipeline:
  model:
  device: cpu # cpu or cuda
  use_transformer: true

  # Model configuration
  # 'a' = American English
  # 'b' = British English
  # 'e' = Spanish
  # 'f' = French
  # 'h' = Hindi
  # 'i' = Italian
  # 'p' = Portuguese
  # 'j' = Japanese
  # 'z' = Chinese
  language_code: en-us

  # Request defaults
  speed: 1.0 # Can be set during request
  voice: "af_heart" # Can be set during request
  split_pattern: "\n" # Can be set during request

Dependencies

Linux

Ubuntu

sudo apt update
sudo apt install libglslang-dev

Manjaro

sudo pacman -S ffmpeg glslang

# Check for version mismatch
find /usr -name "libglslang-default-resource-limits.so*"
# If version mismatch
sudo ln -s /usr/lib/libglslang-default-resource-limits.so.15 /usr/lib/libglslang-default-resource-limits.so.14

# Check for version mismatch
find /usr -name "libSPIRV.so*"
# If version mismatch

sudo ldconfig

If NVIDIA is not working:

sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm

MacOS

brew install ffmpeg
brew install glslang

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_neuron-0.0.5.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_neuron-0.0.5-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file speech_neuron-0.0.5.tar.gz.

File metadata

  • Download URL: speech_neuron-0.0.5.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.8 Darwin/24.3.0

File hashes

Hashes for speech_neuron-0.0.5.tar.gz
Algorithm Hash digest
SHA256 3cbc18ea75f15a5408971a80899c7b546f5affd0bb56794269fc2bc64c31dd29
MD5 c117192b3d99b39ff973068ae541f12a
BLAKE2b-256 9c2d4b305517da55ae70b5a703026eb1fb132cd62c490ae925d64d8ff07679f7

See more details on using hashes here.

File details

Details for the file speech_neuron-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: speech_neuron-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.8 Darwin/24.3.0

File hashes

Hashes for speech_neuron-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 99527827affe3ca467694bd51e7c2cf3ee424532ad8267281bdda080e840340d
MD5 2b02d412e79e7137a62bd44b4b3d8df8
BLAKE2b-256 0b07a95e9972a82d8facb5626dfa3675c2dc7442701be1b83b03f76996ac4ade

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page