A pakage for crawling and processing audio, caption from Youtube
Project description
Audio, Caption Crawler and Processor
Downloads and processes the audios and captions(subtitles) from Youtube videos for Speech AI
Requirements
- Currently requires python >= 3.6
- FFmpeg
To Use
from accp import ACCP
playlist_name=""
playlist_url = ""
accp = ACCP(playlist_name, playlist_url)
accp.download_audio() #download audio from youtube
accp.download_caption() #download captions from youtube
accp.audio_split() #split
Results
datasets
|- playlist name
|- metadata.csv
|- alignment.json
|- wavs
├── 1.wav
├── 2.wav
├── 3.wav
└── ...
and metadata.csv
should look like:
{
0001.wav|그래서 사람들도 날 핍이라고 불렀다.,
0002.wav|크리스마스 덕분에 부엌에 먹을게 가득했다.,
0003.wav|조가 자신이 그 사람이라고 나섰다.,
...
}
and alignment.json
should look like:
{
"./datasets/playlist name/wavs/0001.wav": "그래서 사람들도 날 핍이라고 불렀다.",
"./datasets/playlist name/wavs/0002.wav": "크리스마스 덕분에 부엌에 먹을게 가득했다.",
"./datasets/playlist name/wavs/0003.wav": "조가 자신이 그 사람이라고 나섰다.",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
accp-0.0.1.tar.gz
(7.3 kB
view details)
Built Distribution
accp-0.0.1-py3-none-any.whl
(9.1 kB
view details)
File details
Details for the file accp-0.0.1.tar.gz
.
File metadata
- Download URL: accp-0.0.1.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5fb30b39138d1c2af690599484500959746fd02e14cdd0b9fa18738dde4783d |
|
MD5 | e0d9e1eb02283de8bd3852fd0f1e2324 |
|
BLAKE2b-256 | c8cf6b270eaeefeb074a6e972ef651fb44b33fe3edd36e46cbb46652b69a0452 |
File details
Details for the file accp-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: accp-0.0.1-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85d6c900d9b36e77b341095d9da79474aa7bd6ff43b6d0683688a32b887743bf |
|
MD5 | 129087f3c124486af9d90b0e9caf74e1 |
|
BLAKE2b-256 | 821e1455cadb4f549be6ba7187cf738cf26eddf62207a654d00373c811565152 |