Skip to main content

A pakage for crawling and processing audio, caption from Youtube

Project description

Audio, Caption Crawler and Processor

Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI

Requirements

  • Currently requires python >= 3.6
  • FFmpeg

To Use

  from accp import ACCP

  playlist_name=""
  playlist_url = ""

  accp = ACCP(playlist_name, playlist_url)
  accp.download_audio()    #download audio from youtube

  accp.download_caption()  #download captions from youtube

  accp.audio_split()       #split 

Results

  datasets
    |- playlist name
        |- metadata.csv
        |- alignment.json
        |- wavs
             ├── 1.wav
             ├── 2.wav
             ├── 3.wav
             └── ...

and metadata.csv should look like:

{
    0001.wav|그래서 사람들도 날 핍이라고 불렀다.,
    0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.,
    0003.wav|조가 자신이 그 사람이라고 나섰다.,
    ...
}

and alignment.json should look like:

{
    "./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
    "./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file audio, caption crawler and processor-0.0.1.tar.gz.

File metadata

  • Download URL: audio, caption crawler and processor-0.0.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for audio, caption crawler and processor-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d882b11109f9cc9f20082ff0b50cbbe46608576a2e6e6e8262d8beb3d68f1c44
MD5 1dcc0b4acdffcdb8409765b47ccdf659
BLAKE2b-256 abbcda692341bde615a7bfc62623e70af7514dbc2e53e41409d5ea614c6845a7

See more details on using hashes here.

File details

Details for the file audio_caption_crawler_and_processor-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for audio_caption_crawler_and_processor-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a66cbe0d4a6f74b63745b020e23d45777098aae3623bb2e38e9c71d5566d7c25
MD5 fa808ded39d2b0db4bc83d9d2be5e0b7
BLAKE2b-256 1d585142a24eaac911305897802194a819f1539bb41c885771ea0a3034f0de0d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page