Library That Preprocessing Audio For TTS/STT.
Project description
PAFST
Library That Preprocessing Audio For TTS.
This library enables easy processing of audio files into a format suitable for TTS training data with a simple execution.
Description
PAFST have three features.
- Separator and Denoiser
- VAD
- Diarization
- STT
- Separator or Denoiser : Removes background music (MR) and noise from each audio file to isolate clean voice tracks.
- VAD : Detects whether the audio is present or absent.
- Diarization : Separates speakers within each audio file, identifying distinct voices.
- STT : Extract text from audio.
# before run()
path
├── TEST-1.wav # have mr or noise
└── TEST-2.wav
# after run()
path
├── speaker_SPEAKER_00
│ ├── SPEAKER_00_1.wav # removed mr and noise
│ ├── SPEAKER_00_2.wav
│ └── SPEAKER_00_3.wav
├── speaker_SPEAKER_01
│ ├── SPEAKER_01_1.wav
│ └── SPEAKER_01_2.wav
├── speaker_SPEAKER_02
│ ├── SPEAKER_02_1.wav
│ └── SPEAKER_02_2.wav
├── asr.json
└── diarization.json
# diarization.json
[
{
"speaker_path": "/processed_audio/speaker_SPEAKER_00/SPEAKER_00_0.wav",
"audio_filepath": "processed_audio//TEST-1.wav", # this is audio separated
"start_time": 0.03,
"end_time": 3.81
},
...
]
# asr.json
[
{
"asr_text": " Let's talk about music. I often do you listen to music.",
"audio_filepath": "/processed_audio/speaker_SPEAKER_00/SPEAKER_00_0.wav",
"language": "en"
}
]
Features
- Separator : Using the UVR project’s model and code for music source separation.
- Denoiser : DFNet3 and Facebook's
denoiser - VAD : Using webrtcvad
- Diarization : Using speaker diarization from pyannote-audio
- STT : Using STT model whisper from OpenAI and
faster-whisper
Setup
This library was developed using Python 3.10, and we recommend using Python versions 3.8 to 3.10 for compatibility.
While the library is compatible with both Linux and Windows, all testing was conducted on Linux. For any issues or errors encountered while running on Linux, please feel free to open an issue.
Before running the library, please ensure the following are installed:
PyTorch
We highly recommend using a GPU to optimize performance. For PyTorch installation, please follow the commands below to ensure compatibility with your GPU
# Example for installing PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
ffmpeg
ffmpeg is required for audio processing tasks within this library. Please ensure it is installed and accessible from your system’s PATH. To install ffmpeg:
Windows
Download the latest FFmpeg release from FFmpeg’s official website, and add the bin folder to your system’s PATH.
Linux
Use the following command to install FFmpeg:
sudo apt update
sudo apt install ffmpeg
After installation, you can verify by running
ffmpeg -version
HuggingFace Access Token (required for diarization)
To enable diarization functionality, please complete the following steps
- Accept
pyannote/segmentation-3.0user conditions - Accept
pyannote/speaker-diarization-3.1user conditions - Create access token at
hf.co/settings/tokens.
from pafst.pafts import PAFST
p = PAFST(
path = 'your_audio_directory_path',
output_path = 'output_path',
hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE"
)
After completing the setup steps above, you can install this library by running
pip install pafst
Usage
from pafst import PAFST
p = PAFST(
path = 'your_audio_directory_path',
output_path = 'output_path',
hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE" # if you use diarization
)
# Separator
p.separator() # or
p.denoiser(processor="dfn") # use "den" for facebook's denoiser
p.vad() # voice-activity-detection using webrtcvad
# Diarization
p.diarization()
# STT
p.stt(model_size='small')
# One-Click Process
p.run()
TODO
- Command line
- Clean logging
- Separator with Model Selection
References:
License
The code of PAFST is MIT-licensed
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pafst-1.0.0.tar.gz.
File metadata
- Download URL: pafst-1.0.0.tar.gz
- Upload date:
- Size: 124.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7db51957202d01e285522e2301b6cdfaf62c948d09ff684edbc1c2396d29d746
|
|
| MD5 |
26b4283e4e2afaff02c8f9d18f663d18
|
|
| BLAKE2b-256 |
4d1ea26b901c838cf2472506b158444af3d13ceaad1b1efacd85d66b4ba9730e
|
File details
Details for the file pafst-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pafst-1.0.0-py3-none-any.whl
- Upload date:
- Size: 157.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d013335752be5af5b5260fa2811b65c263a3e2da70ba8235f79b6911c620689c
|
|
| MD5 |
9976b144bd88b186fc3712abaabaea1d
|
|
| BLAKE2b-256 |
6ca8bd676fc715b88ff9a5a84efff35e643d2a20366c5b94c37d1a90a5086b83
|