Skip to main content

Scraper for Czech TV subtitles.

Project description

CzechTVSrtScraper

Scraper of hidden subtitles from Czech TV pages into SRT format.

Usage

To create SRT file with subtitles scraped from the webpage type following

# first episode of Most! series
url = 'https://www.ceskatelevize.cz/ivysilani/10995220806-most/216512120010001/titulky'

# scrape and save
import CzechTVSrt as CTsrt
CTsrt.scrape_srt(url, 'output.srt')

By default requests library is used for fetching. In order to use Selenium, it needs to be installed separately (manually) as well as the browser driver. By default, Chrome is used.

To use Selenium, type

import CzechTVSrt as CTsrt
CTsrt.scrape_srt(url, 'output.srt', use_selenium = True)

To use Selenium and Firefox as the browser type

import CzechTVSrt as CTsrt
CTsrt.scrape_srt(url, 'output.srt', use_selenium = True, browser = 'firefox')

The subtitles have specified only the start point, so the threshold for length can be set so it is well timed, by default it is 10 s. Set the threshold in seconds with

import CzechTVSrt as CTsrt
CTsrt.scrape_srt(url, 'output.srt', max_duration = 7)

Contribution

Author: Martin Benes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CzechTVSrt-0.1.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

CzechTVSrt-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file CzechTVSrt-0.1.0.tar.gz.

File metadata

  • Download URL: CzechTVSrt-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3rc1

File hashes

Hashes for CzechTVSrt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 50a6eda79be9ce13f5461acdef68dfbe81472adfc1035973c2fbc03fed3983eb
MD5 61e4972113bf2ead70273ffff89ab04e
BLAKE2b-256 d0ac0aa748d8855643cbc22a678d7045d527c084fb4744c4647535677e5a2a82

See more details on using hashes here.

File details

Details for the file CzechTVSrt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: CzechTVSrt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3rc1

File hashes

Hashes for CzechTVSrt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 58f7bfa74a3530b77dd7ca75c43c7448bfd6107b7025ddf92a1d6509aaeb1735
MD5 100917e045ce20c889200578bd16bcba
BLAKE2b-256 620df3092904fe154f39bae71cf022a2ea39587fc26551dfac8a1d33a6d45cc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page