A simple FastAPI server to host XTTSv2

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

A simple FastAPI Server to run XTTSv2

There's a google collab version you can use it if your computer is weak. You can check out the guide

This project is inspired by silero-api-server and utilizes XTTSv2.

I created a Pull Request that has been merged into the dev branch of SillyTavern: here.

The TTS module or server can be used in any way you prefer.

Installation

To begin, install the xtts-api-server package using pip:

pip install xtts-api-server

I strongly recommend installing PyTorch with CUDA support to leverage the processing power of your video card, which will enhance the speed of the entire process:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Starting Server

python -m xtts_api_server will run on default ip and port (localhost:8020)

usage: xtts_api_server [-h] [-hs HOST] [-p PORT] [-sf SPEAKER_FOLDER] [-o OUTPUT] [-t TUNNEL_URL]

Run XTTSv2 within a FastAPI application

options:
  -h, --help show this help message and exit
  -hs HOST, --host HOST
  -p PORT, --port PORT
  -sf SPEAKER_FOLDER, --speaker_folder The folder where you get the samples for tts
  -o OUTPUT, --output Output folder
  -t TUNNEL_URL, --tunnel URL of tunnel used (e.g: ngrok, localtunnel)

The first time you run or generate, you may need to confirm that you agree to use XTTS.

API Docs

API Docs can be accessed from http://localhost:8020/docs

Voice Samples

You can find the sample in this repository, also by default samples will be saved to /output/output.wav or you can change this, more details in the API documentation

Selecting Folder

You can change the folders for speakers and the folder for output via the API.

Get Speakers

Once you have at least one file in your speakers folder, you can get its name via API and then you only need to specify the file name.

Note on creating samples for quality voice cloning

The following post is a quote by user Material1276 from reddit

Some suggestions on making good samples

Keep them about 7-9 seconds long. Longer isn't necessarily better.

Make sure the audio is down sampled to a Mono, 24000Hz, 16 Bit wav file. You will slow down processing by a large % and it seems cause poor quality results otherwise (based on a few tests). 24000Hz is the quality it outputs at anyway!

Using the latest version of Audacity, select your clip and Tracks > Resample to 24000Hz (type in 24000 in the box), then Tracks > Mix > Stereo to Mono. and then File > Export Audio, saving it as a WAV of 24000Hz

-If you need to do any audio cleaning, do it before you compress it down to the above settings (Mono, 24000Hz, 16 Bit).

Ensure the clip you use doesn't have background noises or music on e.g. lots of movies have quiet music when many of the actors are talking. Bad quality audio will have hiss that needs clearing up. The AI will pick this up, even if we don't, and to some degree, use > it in the simulated voice to some extent, so clean audio is key!

Try make your clip one of nice flowing speech, like the included example files. No big pauses, gaps or other sounds. Preferably one that the person you are trying to copy will show a little vocal range. Example files are in \text-generation- >webui\extensions\coqui_tts\voices

Make sure the clip doesn't start or end with breathy sounds (breathing in/out etc).

Using AI generated audio clips may introduce unwanted sounds as its already a copy/simulation of a voice, though, this would need testing.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.9.0

Jun 2, 2024

0.8.6

Mar 2, 2024

0.8.5

Feb 20, 2024

0.8.4

Jan 23, 2024

0.8.3

Jan 5, 2024

0.8.2

Jan 2, 2024

0.8.1

Jan 2, 2024

0.8.0

Jan 2, 2024

0.7.6

Jan 2, 2024

0.7.5

Dec 26, 2023

0.7.4

Dec 26, 2023

0.7.3

Dec 23, 2023

0.7.2

Dec 21, 2023

0.7.1

Dec 21, 2023

0.7.0

Dec 21, 2023

0.6.8

Dec 19, 2023

0.6.7

Dec 17, 2023

0.6.6

Dec 17, 2023

0.6.5

Dec 17, 2023

0.6.4

Dec 17, 2023

0.6.3

Dec 13, 2023

0.6.2

Dec 5, 2023

0.6.1

Dec 4, 2023

0.6.0

Dec 4, 2023

0.5.9

Dec 2, 2023

0.5.8

Nov 30, 2023

0.5.7

Nov 30, 2023

0.5.6

Nov 30, 2023

0.5.5

Nov 30, 2023

0.5.4

Nov 29, 2023

0.5.3

Nov 29, 2023

0.5.2

Nov 29, 2023

0.5.1

Nov 29, 2023

0.5.0

Nov 29, 2023

0.4.5

Nov 27, 2023

0.4.4

Nov 27, 2023

0.4.3

Nov 27, 2023

0.4.2

Nov 27, 2023

0.4.1

Nov 27, 2023

0.4.0

Nov 27, 2023

0.3.2

Nov 24, 2023

This version

0.3.1

Nov 23, 2023

0.3.0

Nov 23, 2023

0.2.6

Nov 21, 2023

0.2.5

Nov 21, 2023

0.2

Nov 21, 2023

0.1.1

Nov 21, 2023

0.1.0

Nov 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtts_api_server-0.3.1.tar.gz (1.8 MB view details)

Uploaded Nov 23, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xtts_api_server-0.3.1-py3-none-any.whl (8.3 kB view details)

Uploaded Nov 23, 2023 Python 3

File details

Details for the file xtts_api_server-0.3.1.tar.gz.

File metadata

Download URL: xtts_api_server-0.3.1.tar.gz
Upload date: Nov 23, 2023
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.23.0

File hashes

Hashes for xtts_api_server-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`4019624e4c3767bf04b73b593bcd8d6f00e0c50c07fa3f879a5d8d469f708adc`
MD5	`a170acb719ebad7de7d949bd8a40a7d7`
BLAKE2b-256	`a344bbf1e33b97bc6e2d1ba45775c2b94a66da4e67ad6bb0ab306b8e4b01cfa0`

See more details on using hashes here.

File details

Details for the file xtts_api_server-0.3.1-py3-none-any.whl.

File metadata

Download URL: xtts_api_server-0.3.1-py3-none-any.whl
Upload date: Nov 23, 2023
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.23.0

File hashes

Hashes for xtts_api_server-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88ef179643f63bff9db8d1b2221b72f60d7de8040fc355516db829b404a81265`
MD5	`47e35b2b42bbc364c61cade015e78801`
BLAKE2b-256	`5834868085c9e7bec1158f817892363dd517282560511dc36c64d7445f5e6ed3`

See more details on using hashes here.

xtts-api-server 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A simple FastAPI Server to run XTTSv2

Installation

Starting Server

API Docs

Voice Samples

Selecting Folder

Get Speakers

Note on creating samples for quality voice cloning

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes