Skip to main content

A pakage for crawling and processing audio, caption from Youtube

Project description

Audio, Caption Crawler and Processor

Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI

Requirements

  • Currently requires python >= 3.6
  • FFmpeg

To Use

  from accp import ACCP

  playlist_name=""
  playlist_url = ""

  accp = ACCP(playlist_name, playlist_url)
  accp.download_audio()    #download audio from youtube

  accp.download_caption()  #download captions from youtube

  accp.audio_split()       #split 

Results

  datasets
    |- playlist name
        |- metadata.csv
        |- alignment.json
        |- wavs
             ├── 1.wav
             ├── 2.wav
             ├── 3.wav
             └── ...

and metadata.csv should look like:

{
    0001.wav|그래서 사람들도 날 핍이라고 불렀다.,
    0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.,
    0003.wav|조가 자신이 그 사람이라고 나섰다.,
    ...
}

and alignment.json should look like:

{
    "./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
    "./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
    "./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accp-0.0.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

accp-0.0.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file accp-0.0.1.tar.gz.

File metadata

  • Download URL: accp-0.0.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for accp-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b5fb30b39138d1c2af690599484500959746fd02e14cdd0b9fa18738dde4783d
MD5 e0d9e1eb02283de8bd3852fd0f1e2324
BLAKE2b-256 c8cf6b270eaeefeb074a6e972ef651fb44b33fe3edd36e46cbb46652b69a0452

See more details on using hashes here.

File details

Details for the file accp-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: accp-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for accp-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85d6c900d9b36e77b341095d9da79474aa7bd6ff43b6d0683688a32b887743bf
MD5 129087f3c124486af9d90b0e9caf74e1
BLAKE2b-256 821e1455cadb4f549be6ba7187cf738cf26eddf62207a654d00373c811565152

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page