A pakage for crawling and processing audio, caption from Youtube
Project description
Audio, Caption Crawler and Processor
Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI
Requirements
- Currently requires python >= 3.6
- FFmpeg
To Use
from accp import ACCP
playlist_name=""
playlist_url = ""
accp = ACCP(playlist_name, playlist_url)
accp.download_audio() #download audio from youtube
accp.download_caption() #download captions from youtube
accp.audio_split() #split
Results
datasets
|- playlist name
|- metadata.csv
|- alignment.json
|- wavs
├── 1.wav
├── 2.wav
├── 3.wav
└── ...
and metadata.csv
should look like:
{
0001.wav|그래서 사람들도 날 핍이라고 불렀다.,
0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.,
0003.wav|조가 자신이 그 사람이라고 나섰다.,
...
}
and alignment.json
should look like:
{
"./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
"./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
"./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for audio, caption crawler and processor-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d882b11109f9cc9f20082ff0b50cbbe46608576a2e6e6e8262d8beb3d68f1c44 |
|
MD5 | 1dcc0b4acdffcdb8409765b47ccdf659 |
|
BLAKE2b-256 | abbcda692341bde615a7bfc62623e70af7514dbc2e53e41409d5ea614c6845a7 |
Close
Hashes for audio_caption_crawler_and_processor-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a66cbe0d4a6f74b63745b020e23d45777098aae3623bb2e38e9c71d5566d7c25 |
|
MD5 | fa808ded39d2b0db4bc83d9d2be5e0b7 |
|
BLAKE2b-256 | 1d585142a24eaac911305897802194a819f1539bb41c885771ea0a3034f0de0d |