Skip to main content

Library That Preprocessing Audio For TTS.

Project description

PAFTS

Library That Preprocessing Audio For TTS.

PAFTS is a library for making Text-to-Speech dataset. TTS data basically requires clean audio files and a text file with text corresponding to each audio file. This library makes audio files clean and creates text file with text corresponding to each audio file.

Description

PAFTS consist of three main operations.

  1. Transform
  2. Delete BGM
  3. STT

Transform operations changes the sampling rate(sr), channel, and format of the audio files.

Delete BGM operation removes background music from audio files.

STT operation generates text corresponding to the audio files.

# before run()

      path
        ├── 1_001.wav
        ├── 1_002.wav
        ├── 1_003.wav
        ├── 1_004.wav
        └── abc.wav


# after run()
    
      path
        ├── 1_001.wav # Background music removed 
        ├── 1_002.wav # sr, channel, format unified
        ├── 1_003.wav
        ├── 1_004.wav
        ├── abc.wav
        └── text.json
        
        # text.json
        {
              '1_001.wav' : "I have a note.", 
              '1_002.wav' : "I want to eat chicken.",
              '1_003.wav' : "...",
              '1_004.wav' : "...",
              'abc.wav' : "...",   
        }

Note

  • Audio files are not provided. Please prepare your own audio files.
  • Audio files are appropriate to say one or two sentences for 3 to 10 seconds.
  • If the background music is music with lyrics, the background music cannot be removed clearly.
  • Google Web Speech is free, but the quality is low, so if you want high quality, use Google Cloud Speech API or Azure STT API

Features

  • Use the spleeter to remove background music.
  • In STT, you can use Google Web Speech, Google Cloud Speech and Azure STT.
  • If you use Google Cloud Speech API or Azure STT API, you need API key.
  • ❗ The audio files may be modified or changed during the Transform process and Delete BGM process, so please back up the original audio files.
  • ❗ Google Cloud Speech API and Azure STT API will be charged if they exceed the free usage, so please check the price options carefully.

Requirements

  • python >= 3.8
  • spleeter
  • pydub
  • SpeechRecognition
  • tqdm

Installation

pip install pafts

Usage

  • Quick start:

    from pafts import PAFTS
    pafts = PAFTS(dataset_path="your dataset path", language='language')
    pafts.run()
    
    
    # Example
    
    pafts = PAFTS(
        dataset_path='C:\\Users\\82109\\Desktop\\dataset',
        language='en-us',
    )
    pafts.run()
    
    
    
    
    >> Run...
    | > Dataset name : dataset
    | > Path : C:\Users\82109\Desktop\dataset
    | > language : en-us
    | > Number of files : 5
    | > Total duration : 14.760000000000002
    
    > Transform items...
    | > sr : 22050
    | > channel : 1
    | > format : wav
    
    > Delete BGM...
    | > Number of items : 5
    | > Path : C:\Users\82109\Desktop\dataset
    abc.wav: 100%|██████████| 5/5 [00:13<00:00,  2.62s/it]
    | > Number of Success items : 5
    | > Number of failure items : 0
    
    > Preparing STT API...
    | > STT API : google web speech
    | > Dataset name : dataset
    | > Path : C:\Users\82109\Desktop\dataset
    | > language : en-us
    | > Number of files : 5
    | > Total duration : 14.760000000000002
    
    abc.wav: 100%|██████████| 5/5 [00:11<00:00,  2.27s/it]
    
    | > Numbers of deleted files : 0
    Saved at C:\Users\82109\Desktop\dataset\text.json
    Successfully Completed.
    

    'dataset_path' is your audio files path. 'language' is BCP 47 tag. You can add a detailed option to the argument of run(). Please refer to the document of the run() for more information.

  • If you want to task step by step:

    from pafts import PAFTS
    pafts = PAFTS(dataset_path="your dataset path", language='language', dataset_name='dataset name', key_path='api key path')
    pafts.transform_items(sr=22050, channel=1, formats='audio format')
    pafts.delete_bgm()
    dic = pafts.stt(stt_api_name='stt api name')
    pafts.save(dic=dic, output_name='text.json')
    
  • If you want to make key file:

    from pafts import make_key_file
    make_key_file()    # default path : ./key.json
    
    # key.json
    
    {
        "google_cloud_stt": "credentials_json file path",
        "azure_stt": {
            "key": "KEY",
            "location": "LOCATION"
        }
    }
    
  • If you want to Flatten directory structure:

    from pafts import PAFTS
    pafts = PAFTS(dataset_path="your dataset path")
    pafts.flatten()
    
    before dataset structure
    
          path
            ├── a
            │   ├── 1.wav
            │   ├── 2.wav
            │   └── 3.wav
            ├── b
            │   ├── 1.wav
            │   └── 2.wav
            ├── 1.wav
            ├── 2.wav
            └── c
                └── d
                    └── 1.wav
    
    
    after dataset structure
    
          path
            ├── a_1.wav
            ├── a_2.wav
            ├── a_3.wav
            ├── b_1.wav
            ├── b_2.wav
            ├── 1.wav
            ├── 2.wav
            └── c_d_1.wav
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pafts-0.0.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

pafts-0.0.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file pafts-0.0.0.tar.gz.

File metadata

  • Download URL: pafts-0.0.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for pafts-0.0.0.tar.gz
Algorithm Hash digest
SHA256 4be68da9a25628ba55f7e810c24e6170dd3d9e68cfb854254fed2a77ef207ba7
MD5 2935a285da14c9d3b2cc444db09f5b56
BLAKE2b-256 582235f29408a3ab7706db1715803a3f2972484cb283e47aa694d706f52bd396

See more details on using hashes here.

File details

Details for the file pafts-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: pafts-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for pafts-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1735dbec59999dd70ec8c2072d1f3a9a6a0a959f76f7e68a35854e7468447346
MD5 a580b983cbe85023ab756b93e43c4e2f
BLAKE2b-256 db816571834bd6b6e85e00e15fba27068695289e541811fbe14df1192e9ea546

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page