Skip to main content

A package for crawling and processing audio, caption from Youtube

Project description

Audio, Caption Crawler and Processor -TTS Data Generator-

Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI
Generates audio datas from Youtube for TTS

Requirements

  • Currently requires python == 3.6
  • FFmpeg
  • youtube_dl
  • pydub
  • youtube_transcript_api

To Use

  pip3 install vctube

  from vctube import VCtube

  playlist_name=""
  playlist_url = ""
  lang = ""   #ex) ko, en, fr, de...

  vc = VCtube(playlist_name, playlist_url, lang)

  vc.download_audio()    #download audios from youtube

  vc.download_captions()  #download captions from youtube

  vc.audio_split()       #split audio with captions

Results

  datasets
    |- playlist name
        |- metadata.csv
        |- alignment.json
        |- wavs
             ├── 1.wav
             ├── 2.wav
             ├── 3.wav
             └── ...

and metadata.csv should look like:

{
    "0001.wav|그래서 사람들도 날 핍이라고 불렀다.",
    "0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "0003.wav|조가 자신이 그 사람이라고 나섰다.",
    ...
}

and alignment.json should look like:

{
    "./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
    "./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
    ...
}

Pypi address

https://pypi.org/project/vctube/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vctube-1.3.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

vctube-1.3-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file vctube-1.3.tar.gz.

File metadata

  • Download URL: vctube-1.3.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for vctube-1.3.tar.gz
Algorithm Hash digest
SHA256 d9f099b59272640f34015b2c1863d27a17da452b0608dd1fa79f645d9bd0f479
MD5 7515a39f6dea9604e4f17be24e0e34dd
BLAKE2b-256 e940ecc7d9c03bb889bf37e25391f2e3a9c42ef2282f03d9262686cd7cb131c4

See more details on using hashes here.

File details

Details for the file vctube-1.3-py3-none-any.whl.

File metadata

  • Download URL: vctube-1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for vctube-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6478846f0db973b19722a90eb1369f3b39c57b7addc4e2a5a566a5468ba35433
MD5 10f6a5591e163243fce0a24f2aa07c48
BLAKE2b-256 24eae88cb29365871b2e497949ff801e2668500e26d21eefc04f7e117df39733

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page