Skip to main content

Library That Preprocessing Audio For TTS.

Project description

PAFTS


Library That Preprocessing Audio For TTS.

This library enables easy processing of audio files into a format suitable for TTS training data with a simple execution.

Description

PAFTS have three features.

  1. Separator
  2. Diarization
  3. STT
  • Separator : Removes background music (MR) and noise from each audio file to isolate clean voice tracks.
  • Diarization : Separates speakers within each audio file, identifying distinct voices.
  • STT : Extract text from audio.
# before run()

      path
        ├── 1_001.wav # have mr or noise
        ├── 1_002.wav
        ├── 1_003.wav
        ├── 1_004.wav
        └── abc.wav


# after run()
    
       path
        ├── SPEAKER_00
        │   ├── SPEAKER_00_1.wav # removed mr and noise
        │   ├── SPEAKER_00_2.wav
        │   └── SPEAKER_00_3.wav
        ├── SPEAKER_01
        │   ├── SPEAKER_01_1.wav
        │   └── SPEAKER_01_2.wav
        ├── SPEAKER_02
        │   ├── SPEAKER_02_1.wav
        │   └── SPEAKER_02_2.wav
        └── audio.json
        
        # audio.json
        {
              'SPEAKER_00_1.wav' : "I have a note.", 
              'SPEAKER_00_2.wav' : "I want to eat chicken.",
              'SPEAKER_00_3.wav' : "...",
              'SPEAKER_01_1.wav' : "...",
              'SPEAKER_01_2.wav' : "...",   
        }

Features

  • Separator : Using the UVR project’s model and code for music source separation.
  • Diarization : Using speaker diarization from pyannote-audio
  • STT : Using STT model whisper from OpenAI

Setup

This library was developed using Python 3.10, and we recommend using Python versions 3.8 to 3.10 for compatibility.

While the library is compatible with both Linux and Windows, all testing was conducted on Windows. For any issues or errors encountered while running on Linux, please feel free to open an issue.

Before running the library, please ensure the following are installed:

PyTorch

We highly recommend using a GPU to optimize performance. For PyTorch installation, please follow the commands below to ensure compatibility with your GPU

# Example for installing PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

ffmpeg

ffmpeg is required for audio processing tasks within this library. Please ensure it is installed and accessible from your system’s PATH. To install ffmpeg:

Windows

Download the latest FFmpeg release from FFmpeg’s official website, and add the bin folder to your system’s PATH.

Linux

Use the following command to install FFmpeg:

sudo apt update
sudo apt install ffmpeg

After installation, you can verify by running

ffmpeg -version

HuggingFace Access Token (required for diarization)

To enable diarization functionality, please complete the following steps

  1. Accept pyannote/segmentation-3.0 user conditions
  2. Accept pyannote/speaker-diarization-3.1 user conditions
  3. Create access token at hf.co/settings/tokens.
from pafts.pafts import PAFTS

p = PAFTS(
    path = 'your_audio_directory_path',
    output_path = 'output_path',
    hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE"
)

After completing the setup steps above, you can install this library by running

pip install pafts

Usage

from pafts import PAFTS

p = PAFTS(
    path = 'your_audio_directory_path',
    output_path = 'output_path',
    hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE" # if you use diarization
    
)

# Separator
p.separator()

# Diarization
p.diarization()

# STT
p.STT(model_size='small')

# One-Click Process
p.run()

TODO

  • Command line
  • Clean logging
  • Separator with Model Selection
  • Update README.md
  • Add VAD

License

The code of PAFTS is MIT-licensed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pafts-1.0.0.tar.gz (119.4 kB view details)

Uploaded Source

Built Distribution

pafts-1.0.0-py3-none-any.whl (151.8 kB view details)

Uploaded Python 3

File details

Details for the file pafts-1.0.0.tar.gz.

File metadata

  • Download URL: pafts-1.0.0.tar.gz
  • Upload date:
  • Size: 119.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pafts-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fbd0604558841e8ee9f945c834cba98b11aee9981b49d57352ff87549e53ccf3
MD5 593e6f8ecede8a009217c8dd07d43eb0
BLAKE2b-256 146b5abec9fded7dad89f1a5a4e3edbeab1433d788d0f91976ff467276cf4a2f

See more details on using hashes here.

File details

Details for the file pafts-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pafts-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 151.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for pafts-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab4bf6c71a5f010e717ea6ea82ad77c95c9428c5147739b60e0a1d98e8f6a6a2
MD5 2e653f14a10daad103eee9ec7fb2f377
BLAKE2b-256 6c5173757c38a8c2a4446e56baef9bef195eab9e237c3e59ea390d0ae8f0384f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page