Generates a dataset for the Turkish speech recognition.
Project description
ArdicSrtCollector
ArdicSrtCollector has been developed to generate the Turkish speech recognition dataset. As parameters, it takes a txt file consisting of the links of these Youtube videos and a folder name to store the files to be created. For each youtube video URL, it downloads the audio file, extracts subtitles as the SRT format, and saves as two new files to the disk. Then it cropped (using FFMPEG) the audio file according to the start and end time of each subtitle and creates a new audio file, and at the same time saves the current subtitle as a new txt file.
Installation
- Install ffmpeg.(it is re)
- Run
$ pip install ardicsrtcollector
.
Usage
1- From the terminal
ardicsrtcollector [-h] [-sv SAVE_PATH] -ufp URL_FILE_PATH
To convert the Youtube URL to mp3 and srt file.
optional arguments:
-h, --help show this help message and exit
-sv SAVE_PATH, --save_path SAVE_PATH
Path to save converted files (default: downloads_convert)
-ufp URL_FILE_PATH, --url_file_path URL_FILE_PATH
A file which contains youtube URLs
Example
Run on terminal :ardicsrtcollector -ufp urls.txt
2- Using it by importing as a package like the one below.
from ardicsrtcollector.youtube_srt_mp3 import YoutubeSrtMp3
YoutubeSrtMp3(urls_file_path="urls.txt", save_dir="save_path").convert()
The content of the file containing the URLs should be as follows.
https://www.youtube.com/watch?v=ENwtC8LgPcw
https://www.youtube.com/watch?v=ENwtC8LgPcw
...
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ardicsrtcollector-1.0.13.tar.gz
.
File metadata
- Download URL: ardicsrtcollector-1.0.13.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 958d0910f58b3a413b8726ea196ff5ee831b9770f2f4963c7e07a8de72c0b9de |
|
MD5 | 40f628f233f18e3bc01330e14aaa3149 |
|
BLAKE2b-256 | 55028a2f7c7dd5bc56f8f38066ee99cd7e6ba358c67c4ae9ec50b602d4a2c3f1 |
File details
Details for the file ardicsrtcollector-1.0.13-py3-none-any.whl
.
File metadata
- Download URL: ardicsrtcollector-1.0.13-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 659ae83ea163af5cd04a7ea66a29ce1915bf6eca9336d411fcced476299e6d7b |
|
MD5 | f3ad18f02ad9bde4046f37c5d3dbdee6 |
|
BLAKE2b-256 | f2c69bf372f426d3806627c5432cb3082d37f67dddd624b2cfda2e2483e6b026 |