Skip to main content

download youtube subtitles(closed caption, cc) as txt or json

Project description

Download Youtube Subtitle Build Status

Download youtube subtitles(closed caption, cc) or srt as txt or json.

Features

  1. Support exportting translation at the same time which is useful for language study.
  2. All available caption will be displayed, use --caption_num --caption_num_second to choose whatever caption you want
  3. Support proxy for youtube, follow the step at Using Anaconda behind a company proxy by setting environment-variables.
  4. Full test with traivis Build Status to make sure things are on rail.

python version of algolia/youtube-captions-scraper: Fetch youtube user submitted or fallback to auto-generated captions

Example

save as txt

dl-youtube-cc https://www.youtube.com/watch?v=wgnigj1ngye --translation ja or dl-youtube-cc wgNiGj1nGYE --translation ja

will saved as Version1.5SpecialProgramGenshinImpact.txt

https://www.youtube.com/watch?v=wgNiGj1nGYE
---------00:00----------
從前,有一對雙胞胎結伴在宇宙中旅行
昔々、宇宙を一緒に旅している双子のペアがいました

---------00:05----------
但有一天,他們前路遇阻
しかしある日、彼らの道は封鎖されました

---------00:07----------
被一個未知的神明生生分離
未知の神によって隔てられている

save as json

dl-youtube-cc wgNiGj1nGYE --translation ja --to_json=True will saved as Version1.5SpecialProgramGenshinImpact.json

{
    "original": [
        {
            "start": "0",
            "dur": "5.056",
            "text": "Once upon a time, two twins traveled together throughout the universe."
        },
	// continue
	],
    "translation": [
        {
            "start": "0",
            "dur": "5.056",
            "text": "昔々、2人の双子が一緒に宇宙を旅していました。"
        },
		// continue
	],
    "merged": [
        {
            "start": "0",
            "dur": "5.056",
            "text": "Once upon a time, two twins traveled together throughout the universe.",
            "translate_text": "昔々、2人の双子が一緒に宇宙を旅していました。"
        },
		// continue
	]

use caption_num caption_num_second to get full control

All available caption will be displayed, use --caption_num --caption_num_second to choose whatever caption you want.

>> dl-youtube-cc "wgNiGj1nGYE" --caption_num=0 --caption_num_second=3, --output_file="0,3-zh,es.txt"
INFO:  available caption(s):
INFO:   as original #0. .zh-Hant 中文(繁體字)
INFO:   #1. .zh-Hans 中文(簡體字)
INFO:   #2. .id      印尼文
INFO:   as translation #3. .es      西班牙文
INFO:   #4. .fr      法文
INFO:   #5. .ru      俄文
INFO:   #6. .en-US   英文(美國)
INFO:   #7. .th      泰文
INFO:   #8. .vi      越南文
INFO:   #9. .pt      葡萄牙文
INFO:   #10. .de      德文
INFO:   marks chosen one in 0-index
INFO:  given by --caption_num default to 0 as original
INFO:  Save to  0,3-zh,es.txt

Install and Run

Install via download-youtube-subtitle · PyPI

  1. pip install download-youtube-subtitle or pip install download-youtube-subtitle --user
  2. dl-youtube-cc -h

or uninstall to reinstall new version

pip uninstall download-youtube-subtitle -y

run in cli

dl-youtube-cc -h will show the following.

NAME
    dl-youtube-cc - download youtube closed caption(subtitles) by videoID

SYNOPSIS
    dl-youtube-cc VIDEOID <flags>

DESCRIPTION
    Examples:
    dl-youtube-cc -h # to see this helpful infomation
    dl-youtube-cc wgNiGj1nGYE --translation 'ja' # use japanese translation, see ./lang_code for full list
    dl-youtube-cc wgNiGj1nGYE --caption_num=1 --translation 'ja' # choose the caption num for original transcript and use japanese translation,
    dl-youtube-cc wgNiGj1nGYE --caption_num=1 --caption_num_second=2 # manually choose the original and translation transcript from available caption list
    dl-youtube-cc wgNiGj1nGYE --translation False # without translation
    dl-youtube-cc wgNiGj1nGYE --save_to_file=False # print stuff in console
    dl-youtube-cc wgNiGj1nGYE --output_file='test.txt' # print stuff in named file
    dl-youtube-cc wgNiGj1nGYE --to_json=True # print stuff in json

POSITIONAL ARGUMENTS
    VIDEOID
        Type: str
        the video link or the id of youtube video, the string after 'v=' in a youtube video link

FLAGS
    --translation=TRANSLATION
        Type: typing.Union[str, bool]
        Default: 'zh-Hans'
        which will be displayed as original transcript, default to 'zh-Hans' for simplified Chinese, see ./lang_code.json for full list, or pass False to disable translation
    --caption_num=CAPTION_NUM
        Type: int
        Default: 0
        choose the caption which will be displayed as original transcript
    --caption_num_second=CAPTION_NUM_SECOND
        Type: Optional[int]
        Default: None
        will surpass translation option, choose the caption which will be displayed as translation transcript
    --output_file=OUTPUT_FILE
        Type: Optional[str]
        Default: None
        default to video title
    --save_to_file=SAVE_TO_FILE
        Type: bool
        Default: True
        pass False to print in console
    --to_json=TO_JSON
        Type: bool
        Default: False
        pass True to export caption to json
    --remove_font_tag=REMOVE_FONT_TAG
        Type: bool
        Default: True
        remove font tag

Use in Code

import download_youtube_subtitle.common as common
import download_youtube_subtitle.main as download_youtube_subtitle
# ...

Development

Environment Setup

for conda

pip install 'fire' 'requests' 'IPython' 'sure'

Usage

python main.py -h
python main.py VIDEOID

Tests

cd tests
./run.sh
./test_cli.sh

Ref

deployment - How can I use setuptools to generate a console_scripts entry point which calls python -m mypackage? - Stack Overflow

Packaging Python Projects — Python Packaging User Guide

./nb/notebook2script.py from course-v3/nbs/dl2 at master · fastai/course-v3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

download-youtube-subtitle-1.1.0.tar.gz (10.9 kB view details)

Uploaded Source

File details

Details for the file download-youtube-subtitle-1.1.0.tar.gz.

File metadata

  • Download URL: download-youtube-subtitle-1.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for download-youtube-subtitle-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0333b4602dc2b6a5448266529dd300efe022bfde74551784d7b71b7097571d68
MD5 6ac1bb4ec89d76538af86652f114aa25
BLAKE2b-256 b2e7bbda852a21cb401591aa45034f40f4f3ab873d5e30a95cc835ea32686a75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page