Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

A pakage for crawling and processing audio, caption from Youtube

Project description

Audio, Caption Crawler and Processor -TTS Data Generator-

Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI
Generates audio datas from Youtube for TTS

Requirements

  • Currently requires python >= 3.6
  • FFmpeg

To Use

  pip3 install vctube

  from vctube import VCtube

  playlist_name=""
  playlist_url = ""
  lang = ""   #ex) ko, en, fr, de...

  vc = VCtube(playlist_name, playlist_url, lang)

  vc.download_audio()    #download audios from youtube

  vc.download_captions()  #download captions from youtube

  vc.audio_split()       #split audio with captions

Results

  datasets
    |- playlist name
        |- metadata.csv
        |- alignment.json
        |- wavs
             ├── 1.wav
             ├── 2.wav
             ├── 3.wav
             └── ...

and metadata.csv should look like:

{
    "0001.wav|그래서 사람들도 날 핍이라고 불렀다.",
    "0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "0003.wav|조가 자신이 그 사람이라고 나섰다.",
    ...
}

and alignment.json should look like:

{
    "./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
    "./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
    ...
}

Pypi address

https://pypi.org/project/vctube/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vctube, version 1.2.1
Filename, size File type Python version Upload date Hashes
Filename, size vctube-1.2.1-py3-none-any.whl (7.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size vctube-1.2.1.tar.gz (6.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page