Skip to main content

Utility to extract the contents of a subtitle file.

Project description

pysub-parser

Version Quality Gate Status CodeCoverage

Utility to extract the contents of a subtitle file.

Supported types:

For more information: http://write.flossmanuals.net/video-subtitling/file-formats

Usage

The method parse requires the following parameters:

  • path: location of the subtitle file.
  • subtype: one of the supported file types, by default file extension is used.
  • encoding: encoding of the file, utf-8 by default.
  • **kwargs: optional parameters.
    • fps: framerate (only used by sub files), 23.976 by default.
from pysubparser import parser

subtitles = parser.parse('./files/space-jam.srt')

for subtitle in subtitles:
    print(subtitle)

Output:

0 > [BALL BOUNCING]
1 > Michael?
2 > What are you doing out here, son? It's after midnight.
3 > MICHAEL: Couldn't sleep, Pops.

Subtitle Class

Each line of a dialogue is represented with a Subtitle object with the following properties:

  • index: position in the file.
  • start: timestamp of the start of the dialog.
  • end: timestamp of the end of the dialog.
  • text: dialog contents.
for subtitle in subtitles:
    print(f'{subtitle.start} > {subtitle.end}')
    print(subtitle.text)
    print()

Output:

00:00:36.328000 > 00:00:38.329000
[BALL BOUNCING]

00:01:03.814000 > 00:01:05.189000
Michael?

00:01:08.402000 > 00:01:11.404000
What are you doing out here, son? It's after midnight.

00:01:11.572000 > 00:01:13.072000
MICHAEL: Couldn't sleep, Pops.

Cleaners

Currently, 4 cleaners are provided:

  • ascii will translate every unicode character to its ascii equivalent.
  • brackets will remove anything between them (e.g., [BALL BOUNCING])
  • formatting will remove formatting keys like <i> and </i>.
  • lower_case will lower case all text.
from pysubparser.cleaners import ascii, brackets, formatting, lower_case

subtitles = brackets.clean(
    lower_case.clean(
        subtitles
    )
)

for subtitle in subtitles:
    print(subtitle)
0 > 
1 > michael?
2 > what are you doing out here, son? it's after midnight.
3 > michael: couldn't sleep, pops.

Writers

Given any list of Subtitle and a path it will output those subtitles in a srt format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysub_parser-1.7.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

pysub_parser-1.7.1-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file pysub_parser-1.7.1.tar.gz.

File metadata

  • Download URL: pysub_parser-1.7.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.18 Linux/6.2.0-1016-azure

File hashes

Hashes for pysub_parser-1.7.1.tar.gz
Algorithm Hash digest
SHA256 9f539d30a1b23c0674047835505816abe5ba661414b63497b13153ab4421eda5
MD5 bd1633d4e2a3918fd10312281236a03c
BLAKE2b-256 6a4280a9cee612de7d5f3d940befd2bcfe149e39c3e43662048b49fdadb607ab

See more details on using hashes here.

File details

Details for the file pysub_parser-1.7.1-py3-none-any.whl.

File metadata

  • Download URL: pysub_parser-1.7.1-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.18 Linux/6.2.0-1016-azure

File hashes

Hashes for pysub_parser-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02fd234a49a8ab4e36d98a3ed58801466e73178a11b7eab4e62b347ba92b24a9
MD5 c86ea6e5a6bf3352f31e912977206517
BLAKE2b-256 3b98e49af609f6a654d1beb4293dd583dcdb80e67f300a6c2d345ab02c3f0631

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page