Library That Preprocessing Audio For TTS/STT.

These details have not been verified by PyPI

Project links

Homepage

Project description

PAFST

Library That Preprocessing Audio For TTS.

This library enables easy processing of audio files into a format suitable for TTS training data with a simple execution. architecture

Description

PAFST have three features.

Separator and Denoiser
VAD
Diarization
STT

Separator or Denoiser : Removes background music (MR) and noise from each audio file to isolate clean voice tracks.
VAD : Detects whether the audio is present or absent.
Diarization : Separates speakers within each audio file, identifying distinct voices.
STT : Extract text from audio.

# before run()

      path
        ├── TEST-1.wav # have mr or noise
        └── TEST-2.wav
        


# after run()
    
       path
        ├── speaker_SPEAKER_00
        │   ├── SPEAKER_00_1.wav # removed mr and noise
        │   ├── SPEAKER_00_2.wav
        │   └── SPEAKER_00_3.wav
        ├── speaker_SPEAKER_01
        │   ├── SPEAKER_01_1.wav
        │   └── SPEAKER_01_2.wav
        ├── speaker_SPEAKER_02
        │   ├── SPEAKER_02_1.wav
        │   └── SPEAKER_02_2.wav
        ├── asr.json
        └── diarization.json
        
        # diarization.json
        [
              {
                "speaker_path": "/processed_audio/speaker_SPEAKER_00/SPEAKER_00_0.wav",
                "audio_filepath": "processed_audio//TEST-1.wav", # this is audio separated
                "start_time": 0.03,
                "end_time": 3.81
              },
            ...
      ]

      # asr.json
      [
            {
              "asr_text": " Let's talk about music. I often do you listen to music.",
              "audio_filepath": "/processed_audio/speaker_SPEAKER_00/SPEAKER_00_0.wav",
              "language": "en"
            } 
      ]

Features

Separator : Using the UVR project’s model and code for music source separation.
Denoiser : DFNet3 and Facebook's denoiser
VAD : Using webrtcvad
Diarization : Using speaker diarization from pyannote-audio
STT : Using STT model whisper from OpenAI and faster-whisper

Setup

This library was developed using Python 3.10, and we recommend using Python versions 3.8 to 3.10 for compatibility.

While the library is compatible with both Linux and Windows, all testing was conducted on Linux. For any issues or errors encountered while running on Linux, please feel free to open an issue.

Before running the library, please ensure the following are installed:

PyTorch

We highly recommend using a GPU to optimize performance. For PyTorch installation, please follow the commands below to ensure compatibility with your GPU

# Example for installing PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

ffmpeg

ffmpeg is required for audio processing tasks within this library. Please ensure it is installed and accessible from your system’s PATH. To install ffmpeg:

Windows

Download the latest FFmpeg release from FFmpeg’s official website, and add the bin folder to your system’s PATH.

Linux

Use the following command to install FFmpeg:

sudo apt update
sudo apt install ffmpeg

After installation, you can verify by running

ffmpeg -version

HuggingFace Access Token (required for diarization)

To enable diarization functionality, please complete the following steps

Accept pyannote/segmentation-3.0 user conditions
Accept pyannote/speaker-diarization-3.1 user conditions
Create access token at hf.co/settings/tokens.

from pafst.pafts import PAFST

p = PAFST(
    path = 'your_audio_directory_path',
    output_path = 'output_path',
    hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE"
)

After completing the setup steps above, you can install this library by running

pip install pafst

Usage

from pafst import PAFST

p = PAFST(
    path = 'your_audio_directory_path',
    output_path = 'output_path',
    hf_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE" # if you use diarization
    
)

# Separator
p.separator() # or

p.denoiser(processor="dfn") # use "den" for facebook's denoiser

p.vad() # voice-activity-detection using webrtcvad

# Diarization
p.diarization()

# STT
p.stt(model_size='small')

# One-Click Process
p.run()

TODO

Command line
Clean logging
Separator with Model Selection

References:

PAFTS for base code
Paper for DFNet3 use case

License

The code of PAFST is MIT-licensed

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.0

Jan 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pafst-1.0.0.tar.gz (124.7 kB view details)

Uploaded Jan 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pafst-1.0.0-py3-none-any.whl (157.8 kB view details)

Uploaded Jan 5, 2025 Python 3

File details

Details for the file pafst-1.0.0.tar.gz.

File metadata

Download URL: pafst-1.0.0.tar.gz
Upload date: Jan 5, 2025
Size: 124.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for pafst-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7db51957202d01e285522e2301b6cdfaf62c948d09ff684edbc1c2396d29d746`
MD5	`26b4283e4e2afaff02c8f9d18f663d18`
BLAKE2b-256	`4d1ea26b901c838cf2472506b158444af3d13ceaad1b1efacd85d66b4ba9730e`

See more details on using hashes here.

File details

Details for the file pafst-1.0.0-py3-none-any.whl.

File metadata

Download URL: pafst-1.0.0-py3-none-any.whl
Upload date: Jan 5, 2025
Size: 157.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for pafst-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d013335752be5af5b5260fa2811b65c263a3e2da70ba8235f79b6911c620689c`
MD5	`9976b144bd88b186fc3712abaabaea1d`
BLAKE2b-256	`6ca8bd676fc715b88ff9a5a84efff35e643d2a20366c5b94c37d1a90a5086b83`

See more details on using hashes here.

pafst 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PAFST

Library That Preprocessing Audio For TTS.

Description

Features

Setup

PyTorch

ffmpeg

Windows

Linux

HuggingFace Access Token (required for diarization)

Usage

TODO

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes